Overview

Dataset statistics

Number of variables47
Number of observations356206
Missing cells3238222
Missing cells (%)19.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory127.7 MiB
Average record size in memory376.0 B

Variable types

Numeric9
Categorical34
Boolean2
Unsupported2

Warnings

ori has constant value "CA0371100" Constant
agency has constant value "SD" Constant
date_stop has a high cardinality: 731 distinct values High cardinality
time_stop has a high cardinality: 75362 distinct values High cardinality
address_city has a high cardinality: 51 distinct values High cardinality
beat_name has a high cardinality: 127 distinct values High cardinality
highway_exit has a high cardinality: 2181 distinct values High cardinality
address_street has a high cardinality: 45016 distinct values High cardinality
intersection has a high cardinality: 13841 distinct values High cardinality
school_name has a high cardinality: 99 distinct values High cardinality
reason_for_stop_code_text has a high cardinality: 1566 distinct values High cardinality
reason_for_stop_explanation has a high cardinality: 170749 distinct values High cardinality
basis_for_search_explanation has a high cardinality: 26425 distinct values High cardinality
result_text has a high cardinality: 1260 distinct values High cardinality
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
gender_nc is highly correlated with gender_non_conformingHigh correlation
gender_non_conforming is highly correlated with gender_ncHigh correlation
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
gender_nc is highly correlated with gender_non_conformingHigh correlation
gender_non_conforming is highly correlated with gender_ncHigh correlation
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
gender_nc is highly correlated with gender_non_conformingHigh correlation
gender_non_conforming is highly correlated with gender_ncHigh correlation
stop_duration is highly correlated with school_name and 1 other fieldsHigh correlation
stop_in_response_to_cfs is highly correlated with school_name and 2 other fieldsHigh correlation
address_city is highly correlated with school_name and 1 other fieldsHigh correlation
code is highly correlated with result and 4 other fieldsHigh correlation
result is highly correlated with code and 6 other fieldsHigh correlation
officer_assignment_key is highly correlated with assignment and 2 other fieldsHigh correlation
stop_id is highly correlated with school_name and 1 other fieldsHigh correlation
gender_nc is highly correlated with gender_non_conforming and 1 other fieldsHigh correlation
beat is highly correlated with school_name and 1 other fieldsHigh correlation
consented is highly correlated with school_name and 1 other fieldsHigh correlation
perceived_age is highly correlated with school_name and 1 other fieldsHigh correlation
perceived_limited_english is highly correlated with school_nameHigh correlation
perceived_gender is highly correlated with school_name and 2 other fieldsHigh correlation
is_student is highly correlated with is_school and 1 other fieldsHigh correlation
type_of_property_seized is highly correlated with code and 5 other fieldsHigh correlation
action is highly correlated with result and 5 other fieldsHigh correlation
assignment is highly correlated with officer_assignment_key and 2 other fieldsHigh correlation
reason_for_stop is highly correlated with result and 5 other fieldsHigh correlation
gender_non_conforming is highly correlated with gender_nc and 1 other fieldsHigh correlation
is_school is highly correlated with is_studentHigh correlation
school_name is highly correlated with stop_duration and 23 other fieldsHigh correlation
landmark is highly correlated with stop_duration and 14 other fieldsHigh correlation
basis_for_search is highly correlated with result and 5 other fieldsHigh correlation
basis_for_property_seizure is highly correlated with result and 6 other fieldsHigh correlation
exp_years is highly correlated with school_name and 1 other fieldsHigh correlation
perceived_lgbt is highly correlated with school_nameHigh correlation
gender is highly correlated with perceived_gender and 2 other fieldsHigh correlation
gender2 is highly correlated with gender_nc and 3 other fieldsHigh correlation
race is highly correlated with school_name and 1 other fieldsHigh correlation
reason_for_stop_detail is highly correlated with stop_in_response_to_cfs and 5 other fieldsHigh correlation
disability is highly correlated with school_name and 1 other fieldsHigh correlation
highway_exit has 353273 (99.2%) missing values Missing
address_street has 14608 (4.1%) missing values Missing
intersection has 324647 (91.1%) missing values Missing
landmark has 356163 (> 99.9%) missing values Missing
school_name has 355829 (99.9%) missing values Missing
reason_for_stop_code_text has 20564 (5.8%) missing values Missing
reason_for_stop_detail has 20559 (5.8%) missing values Missing
consented has 350521 (98.4%) missing values Missing
basis_for_search has 282703 (79.4%) missing values Missing
basis_for_search_explanation has 302561 (84.9%) missing values Missing
basis_for_property_seizure has 348426 (97.8%) missing values Missing
type_of_property_seized has 348426 (97.8%) missing values Missing
result_text has 159727 (44.8%) missing values Missing
address_block is highly skewed (γ1 = 307.2329112) Skewed
landmark is uniformly distributed Uniform
reason_for_stop_code is an unsupported type, check if it needs cleaning or further analysis Unsupported
result_key is an unsupported type, check if it needs cleaning or further analysis Unsupported
address_block has 34105 (9.6%) zeros Zeros
code has 159725 (44.8%) zeros Zeros

Reproduction

Analysis started2021-09-13 13:55:43.234778
Analysis finished2021-09-13 13:58:32.417863
Duration2 minutes and 49.18 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

stop_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct309924
Distinct (%)87.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean165513.0811
Minimum2443
Maximum324716
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:32.639626image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2443
5-th percentile18644
Q183999.25
median166968.5
Q3245620.75
95-th percentile309461.75
Maximum324716
Range322273
Interquartile range (IQR)161621.5

Descriptive statistics

Standard deviation93278.73671
Coefficient of variation (CV)0.5635731999
Kurtosis-1.188300054
Mean165513.0811
Median Absolute Deviation (MAD)80807.5
Skewness-0.03088392276
Sum5.895675255 × 1010
Variance8700922723
MonotonicityIncreasing
2021-09-13T06:58:32.785237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17401152
 
< 0.1%
18408548
 
< 0.1%
18032646
 
< 0.1%
16993242
 
< 0.1%
18365540
 
< 0.1%
16109539
 
< 0.1%
17447238
 
< 0.1%
23696535
 
< 0.1%
17031634
 
< 0.1%
18364632
 
< 0.1%
Other values (309914)355800
99.9%
ValueCountFrequency (%)
24431
< 0.1%
24441
< 0.1%
24472
< 0.1%
24481
< 0.1%
24491
< 0.1%
24511
< 0.1%
24531
< 0.1%
24541
< 0.1%
24551
< 0.1%
24561
< 0.1%
ValueCountFrequency (%)
3247161
 
< 0.1%
3247151
 
< 0.1%
3247123
< 0.1%
3247011
 
< 0.1%
3246971
 
< 0.1%
3246911
 
< 0.1%
3246901
 
< 0.1%
3246871
 
< 0.1%
3246831
 
< 0.1%
3246821
 
< 0.1%

date_stop
Categorical

HIGH CARDINALITY

Distinct731
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
2020-02-12
 
799
2019-05-23
 
793
2020-02-11
 
791
2019-07-06
 
755
2020-01-16
 
749
Other values (726)
352319 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3562060
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018-07-01
2nd row2018-07-01
3rd row2018-07-01
4th row2018-07-01
5th row2018-07-01

Common Values

ValueCountFrequency (%)
2020-02-12799
 
0.2%
2019-05-23793
 
0.2%
2020-02-11791
 
0.2%
2019-07-06755
 
0.2%
2020-01-16749
 
0.2%
2019-10-23734
 
0.2%
2019-09-24733
 
0.2%
2018-07-03730
 
0.2%
2019-08-21722
 
0.2%
2018-08-02719
 
0.2%
Other values (721)348681
97.9%

Length

2021-09-13T06:58:33.123333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-02-12799
 
0.2%
2019-05-23793
 
0.2%
2020-02-11791
 
0.2%
2019-07-06755
 
0.2%
2020-01-16749
 
0.2%
2019-10-23734
 
0.2%
2019-09-24733
 
0.2%
2018-07-03730
 
0.2%
2019-08-21722
 
0.2%
2018-08-02719
 
0.2%
Other values (721)348681
97.9%

Most occurring characters

ValueCountFrequency (%)
0874623
24.6%
-712412
20.0%
2641691
18.0%
1584745
16.4%
9251550
 
7.1%
8158150
 
4.4%
382920
 
2.3%
769638
 
2.0%
463737
 
1.8%
563731
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2849648
80.0%
Dash Punctuation712412
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0874623
30.7%
2641691
22.5%
1584745
20.5%
9251550
 
8.8%
8158150
 
5.5%
382920
 
2.9%
769638
 
2.4%
463737
 
2.2%
563731
 
2.2%
658863
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-712412
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3562060
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0874623
24.6%
-712412
20.0%
2641691
18.0%
1584745
16.4%
9251550
 
7.1%
8158150
 
4.4%
382920
 
2.3%
769638
 
2.0%
463737
 
1.8%
563731
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3562060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0874623
24.6%
-712412
20.0%
2641691
18.0%
1584745
16.4%
9251550
 
7.1%
8158150
 
4.4%
382920
 
2.3%
769638
 
2.0%
463737
 
1.8%
563731
 
1.8%

time_stop
Categorical

HIGH CARDINALITY

Distinct75362
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
16:00:00
 
1023
15:00:00
 
889
11:00:00
 
831
10:00:00
 
827
09:00:00
 
819
Other values (75357)
351817 

Length

Max length19
Median length8
Mean length8.000277929
Min length8

Characters and Unicode

Total characters2849747
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16998 ?
Unique (%)4.8%

Sample

1st row00:01:37
2nd row00:03:34
3rd row00:05:43
4th row00:05:43
5th row00:19:06

Common Values

ValueCountFrequency (%)
16:00:001023
 
0.3%
15:00:00889
 
0.2%
11:00:00831
 
0.2%
10:00:00827
 
0.2%
09:00:00819
 
0.2%
17:00:00808
 
0.2%
08:00:00792
 
0.2%
15:30:00784
 
0.2%
22:00:00773
 
0.2%
16:30:00685
 
0.2%
Other values (75352)347975
97.7%

Length

2021-09-13T06:58:33.420582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
16:00:001023
 
0.3%
15:00:00889
 
0.2%
11:00:00831
 
0.2%
10:00:00827
 
0.2%
09:00:00819
 
0.2%
17:00:00808
 
0.2%
08:00:00792
 
0.2%
15:30:00784
 
0.2%
22:00:00773
 
0.2%
16:30:00685
 
0.2%
Other values (75346)347984
97.7%

Most occurring characters

ValueCountFrequency (%)
:712412
25.0%
0604584
21.2%
1367409
12.9%
2255487
 
9.0%
5199866
 
7.0%
3193422
 
6.8%
4170620
 
6.0%
889801
 
3.2%
787398
 
3.1%
986012
 
3.0%
Other values (3)82736
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2137308
75.0%
Other Punctuation712412
 
25.0%
Dash Punctuation18
 
< 0.1%
Space Separator9
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0604584
28.3%
1367409
17.2%
2255487
12.0%
5199866
 
9.4%
3193422
 
9.0%
4170620
 
8.0%
889801
 
4.2%
787398
 
4.1%
986012
 
4.0%
682709
 
3.9%
Other Punctuation
ValueCountFrequency (%)
:712412
100.0%
Dash Punctuation
ValueCountFrequency (%)
-18
100.0%
Space Separator
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2849747
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
:712412
25.0%
0604584
21.2%
1367409
12.9%
2255487
 
9.0%
5199866
 
7.0%
3193422
 
6.8%
4170620
 
6.0%
889801
 
3.2%
787398
 
3.1%
986012
 
3.0%
Other values (3)82736
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII2849747
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
:712412
25.0%
0604584
21.2%
1367409
12.9%
2255487
 
9.0%
5199866
 
7.0%
3193422
 
6.8%
4170620
 
6.0%
889801
 
3.2%
787398
 
3.1%
986012
 
3.0%
Other values (3)82736
 
2.9%

stop_duration
Real number (ℝ≥0)

HIGH CORRELATION

Distinct347
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.12754979
Minimum1
Maximum1440
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:33.555891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q110
median15
Q325
95-th percentile120
Maximum1440
Range1439
Interquartile range (IQR)15

Descriptive statistics

Standard deviation49.89945582
Coefficient of variation (CV)1.77404204
Kurtosis200.9267233
Mean28.12754979
Median Absolute Deviation (MAD)7
Skewness9.801072454
Sum10019202
Variance2489.955691
MonotonicityNot monotonic
2021-09-13T06:58:33.723445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1080722
22.7%
1544719
12.6%
541047
11.5%
2036312
10.2%
3024646
 
6.9%
6014652
 
4.1%
812212
 
3.4%
12011204
 
3.1%
611159
 
3.1%
78529
 
2.4%
Other values (337)71004
19.9%
ValueCountFrequency (%)
11092
 
0.3%
22683
 
0.8%
32971
 
0.8%
42555
 
0.7%
541047
11.5%
611159
 
3.1%
78529
 
2.4%
812212
 
3.4%
93206
 
0.9%
1080722
22.7%
ValueCountFrequency (%)
144051
< 0.1%
14221
 
< 0.1%
14191
 
< 0.1%
140018
 
< 0.1%
13511
 
< 0.1%
13502
 
< 0.1%
13451
 
< 0.1%
13381
 
< 0.1%
13303
 
< 0.1%
13011
 
< 0.1%

stop_in_response_to_cfs
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
315052 
1
41154 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

Length

2021-09-13T06:58:34.040569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:34.318826image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

Most occurring characters

ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0315052
88.4%
141154
 
11.6%

address_city
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
SAN DIEGO
350508 
SAN YSIDRO
 
1857
CHULA VISTA
 
656
NATIONAL CITY
 
646
EL CAJON
 
409
Other values (46)
 
2130

Length

Max length36
Median length9
Mean length9.01588126
Min length4

Characters and Unicode

Total characters3211511
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowSAN DIEGO
2nd rowSAN DIEGO
3rd rowSAN DIEGO
4th rowSAN DIEGO
5th rowSAN DIEGO

Common Values

ValueCountFrequency (%)
SAN DIEGO350508
98.4%
SAN YSIDRO1857
 
0.5%
CHULA VISTA656
 
0.2%
NATIONAL CITY646
 
0.2%
EL CAJON409
 
0.1%
LEMON GROVE360
 
0.1%
LA MESA265
 
0.1%
LA JOLLA228
 
0.1%
ESCONDIDO223
 
0.1%
SANTEE212
 
0.1%
Other values (41)842
 
0.2%

Length

2021-09-13T06:58:34.604061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san352533
49.5%
diego350511
49.3%
ysidro1857
 
0.3%
vista658
 
0.1%
chula656
 
0.1%
city647
 
0.1%
national646
 
0.1%
la493
 
0.1%
cajon409
 
0.1%
el409
 
0.1%
Other values (57)2852
 
0.4%

Most occurring characters

ValueCountFrequency (%)
A357649
11.1%
S356149
11.1%
355465
11.1%
O355404
11.1%
N355344
11.1%
I355097
11.1%
E353266
11.0%
D352997
11.0%
G351020
10.9%
L3866
 
0.1%
Other values (15)15254
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2856043
88.9%
Space Separator355465
 
11.1%
Dash Punctuation3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A357649
12.5%
S356149
12.5%
O355404
12.4%
N355344
12.4%
I355097
12.4%
E353266
12.4%
D352997
12.4%
G351020
12.3%
L3866
 
0.1%
R2871
 
0.1%
Other values (13)12380
 
0.4%
Space Separator
ValueCountFrequency (%)
355465
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2856043
88.9%
Common355468
 
11.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A357649
12.5%
S356149
12.5%
O355404
12.4%
N355344
12.4%
I355097
12.4%
E353266
12.4%
D352997
12.4%
G351020
12.3%
L3866
 
0.1%
R2871
 
0.1%
Other values (13)12380
 
0.4%
Common
ValueCountFrequency (%)
355465
> 99.9%
-3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3211511
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A357649
11.1%
S356149
11.1%
355465
11.1%
O355404
11.1%
N355344
11.1%
I355097
11.1%
E353266
11.0%
D352997
11.0%
G351020
10.9%
L3866
 
0.1%
Other values (15)15254
 
0.5%

beat
Real number (ℝ≥0)

HIGH CORRELATION

Distinct126
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean509.0492524
Minimum111
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:34.745684image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum111
5-th percentile121
Q1315
median521
Q3627
95-th percentile931
Maximum999
Range888
Interquartile range (IQR)312

Descriptive statistics

Standard deviation241.5275701
Coefficient of variation (CV)0.4744679792
Kurtosis-0.7689366337
Mean509.0492524
Median Absolute Deviation (MAD)193
Skewness-0.09464900063
Sum181326398
Variance58335.56713
MonotonicityNot monotonic
2021-09-13T06:58:34.875336image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
52129119
 
8.2%
12222763
 
6.4%
61113630
 
3.8%
5249670
 
2.7%
5129647
 
2.7%
7129206
 
2.6%
8138780
 
2.5%
6278067
 
2.3%
6147933
 
2.2%
3137794
 
2.2%
Other values (116)229597
64.5%
ValueCountFrequency (%)
1114189
 
1.2%
1121434
 
0.4%
1131338
 
0.4%
1143225
 
0.9%
1153740
 
1.0%
1163814
 
1.1%
1217741
 
2.2%
12222763
6.4%
1234283
 
1.2%
1243539
 
1.0%
ValueCountFrequency (%)
9997690
2.2%
937814
 
0.2%
936270
 
0.1%
935621
 
0.2%
9344407
1.2%
9331384
 
0.4%
932349
 
0.1%
9312927
 
0.8%
841608
 
0.2%
839948
 
0.3%

beat_name
Categorical

HIGH CARDINALITY

Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
East Village 521
29119 
Pacific Beach 122
 
22763
Midway District 611
 
13630
Core-Columbia 524
 
9670
Logan Heights 512
 
9647
Other values (122)
271377 

Length

Max length25
Median length16
Mean length15.93640758
Min length10

Characters and Unicode

Total characters5676644
Distinct characters64
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPacific Beach 122
2nd rowMission Beach 121
3rd rowEl Cerrito 822
4th rowEl Cerrito 822
5th rowOcean Beach 614

Common Values

ValueCountFrequency (%)
East Village 52129119
 
8.2%
Pacific Beach 12222763
 
6.4%
Midway District 61113630
 
3.8%
Core-Columbia 5249670
 
2.7%
Logan Heights 5129647
 
2.7%
San Ysidro 7129206
 
2.6%
North Park 8138780
 
2.5%
Hillcrest 6278067
 
2.3%
Ocean Beach 6147933
 
2.2%
Kearney Mesa 3137794
 
2.2%
Other values (117)229597
64.5%

Length

2021-09-13T06:58:35.184510image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
east44811
 
4.3%
beach38437
 
3.7%
park36946
 
3.6%
village31791
 
3.1%
mesa31005
 
3.0%
52129119
 
2.8%
12222763
 
2.2%
pacific22763
 
2.2%
mission22093
 
2.1%
heights20500
 
2.0%
Other values (263)732552
70.9%

Most occurring characters

ValueCountFrequency (%)
676574
 
11.9%
a496721
 
8.8%
i330103
 
5.8%
e328709
 
5.8%
1274078
 
4.8%
l257350
 
4.5%
2243157
 
4.3%
s236973
 
4.2%
r235903
 
4.2%
t228362
 
4.0%
Other values (54)2368714
41.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3223309
56.8%
Decimal Number1068618
 
18.8%
Uppercase Letter691872
 
12.2%
Space Separator676574
 
11.9%
Dash Punctuation9670
 
0.2%
Other Punctuation6601
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a496721
15.4%
i330103
10.2%
e328709
10.2%
l257350
8.0%
s236973
7.4%
r235903
7.3%
t228362
7.1%
o218464
 
6.8%
n198600
 
6.2%
c138663
 
4.3%
Other values (16)553461
17.2%
Uppercase Letter
ValueCountFrequency (%)
M91978
13.3%
P73635
10.6%
B66713
9.6%
C64918
9.4%
V61208
8.8%
E51799
 
7.5%
H43171
 
6.2%
S35559
 
5.1%
L35469
 
5.1%
O22108
 
3.2%
Other values (14)145314
21.0%
Decimal Number
ValueCountFrequency (%)
1274078
25.6%
2243157
22.8%
3129219
12.1%
5112142
10.5%
4106678
 
10.0%
671067
 
6.7%
849692
 
4.7%
745057
 
4.2%
937528
 
3.5%
Other Punctuation
ValueCountFrequency (%)
/4393
66.6%
.1429
 
21.6%
'779
 
11.8%
Space Separator
ValueCountFrequency (%)
676574
100.0%
Dash Punctuation
ValueCountFrequency (%)
-9670
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3915181
69.0%
Common1761463
31.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a496721
 
12.7%
i330103
 
8.4%
e328709
 
8.4%
l257350
 
6.6%
s236973
 
6.1%
r235903
 
6.0%
t228362
 
5.8%
o218464
 
5.6%
n198600
 
5.1%
c138663
 
3.5%
Other values (40)1245333
31.8%
Common
ValueCountFrequency (%)
676574
38.4%
1274078
15.6%
2243157
 
13.8%
3129219
 
7.3%
5112142
 
6.4%
4106678
 
6.1%
671067
 
4.0%
849692
 
2.8%
745057
 
2.6%
937528
 
2.1%
Other values (4)16271
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII5676644
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
676574
 
11.9%
a496721
 
8.8%
i330103
 
5.8%
e328709
 
5.8%
1274078
 
4.8%
l257350
 
4.5%
2243157
 
4.3%
s236973
 
4.2%
r235903
 
4.2%
t228362
 
4.0%
Other values (54)2368714
41.7%

highway_exit
Categorical

HIGH CARDINALITY
MISSING

Distinct2181
Distinct (%)74.4%
Missing353273
Missing (%)99.2%
Memory size2.7 MiB
163 Robinson
 
27
163 robinson
 
27
I15 bernardo center
 
22
I-15 Bernardo Center
 
20
I-805/SR-54
 
19
Other values (2176)
2818 

Length

Max length54
Median length17
Mean length18.39038527
Min length2

Characters and Unicode

Total characters53939
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1886 ?
Unique (%)64.3%

Sample

1st rowsb 15 @ mercy rd
2nd rowI-15 and carmel mountain
3rd rowI-15 and Bernardo Center
4th rowI-15 and Bernardo Center
5th rowI-15 and Bernardo Center

Common Values

ValueCountFrequency (%)
163 Robinson27
 
< 0.1%
163 robinson27
 
< 0.1%
I15 bernardo center22
 
< 0.1%
I-15 Bernardo Center20
 
< 0.1%
I-805/SR-5419
 
< 0.1%
I-805/43RD STREET17
 
< 0.1%
I-805/PLAZA BOULEVARD16
 
< 0.1%
I15 Bernardo Center15
 
< 0.1%
I-5/VIA DE SAN YSIDRO13
 
< 0.1%
NORTHBOUND INTERSTATE-15/FRIARS ROAD12
 
< 0.1%
Other values (2171)2745
 
0.8%
(Missing)353273
99.2%

Length

2021-09-13T06:58:35.503671image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
at738
 
6.8%
455
 
4.2%
15433
 
4.0%
sb360
 
3.3%
nb281
 
2.6%
i-15239
 
2.2%
805226
 
2.1%
and201
 
1.9%
street182
 
1.7%
163181
 
1.7%
Other values (852)7522
69.5%

Most occurring characters

ValueCountFrequency (%)
7994
 
14.8%
52377
 
4.4%
a2177
 
4.0%
A2088
 
3.9%
E1908
 
3.5%
T1877
 
3.5%
R1803
 
3.3%
r1685
 
3.1%
e1632
 
3.0%
I1625
 
3.0%
Other values (63)28773
53.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter18911
35.1%
Lowercase Letter17210
31.9%
Space Separator7994
14.8%
Decimal Number7219
 
13.4%
Other Punctuation1376
 
2.6%
Dash Punctuation1218
 
2.3%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Connector Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2177
12.6%
r1685
9.8%
e1632
9.5%
n1468
 
8.5%
t1406
 
8.2%
o1342
 
7.8%
s1063
 
6.2%
i897
 
5.2%
b886
 
5.1%
l765
 
4.4%
Other values (16)3889
22.6%
Uppercase Letter
ValueCountFrequency (%)
A2088
11.0%
E1908
10.1%
T1877
9.9%
R1803
9.5%
I1625
8.6%
S1442
 
7.6%
N1226
 
6.5%
O1154
 
6.1%
B1149
 
6.1%
D695
 
3.7%
Other values (16)3944
20.9%
Decimal Number
ValueCountFrequency (%)
52377
32.9%
11383
19.2%
0973
13.5%
8848
 
11.7%
6410
 
5.7%
3343
 
4.8%
4315
 
4.4%
9292
 
4.0%
2184
 
2.5%
794
 
1.3%
Other Punctuation
ValueCountFrequency (%)
/997
72.5%
@322
 
23.4%
,23
 
1.7%
.20
 
1.5%
&12
 
0.9%
!2
 
0.1%
Space Separator
ValueCountFrequency (%)
7994
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1218
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin36121
67.0%
Common17818
33.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2177
 
6.0%
A2088
 
5.8%
E1908
 
5.3%
T1877
 
5.2%
R1803
 
5.0%
r1685
 
4.7%
e1632
 
4.5%
I1625
 
4.5%
n1468
 
4.1%
S1442
 
4.0%
Other values (42)18416
51.0%
Common
ValueCountFrequency (%)
7994
44.9%
52377
 
13.3%
11383
 
7.8%
-1218
 
6.8%
/997
 
5.6%
0973
 
5.5%
8848
 
4.8%
6410
 
2.3%
3343
 
1.9%
@322
 
1.8%
Other values (11)953
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII53939
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7994
 
14.8%
52377
 
4.4%
a2177
 
4.0%
A2088
 
3.9%
E1908
 
3.5%
T1877
 
3.5%
R1803
 
3.3%
r1685
 
3.1%
e1632
 
3.0%
I1625
 
3.0%
Other values (63)28773
53.3%

address_street
Categorical

HIGH CARDINALITY
MISSING

Distinct45016
Distinct (%)13.2%
Missing14608
Missing (%)4.1%
Memory size2.7 MiB
imperial
 
1535
El Cajon Blvd
 
1516
el cajon blvd
 
1385
imperial ave
 
1355
IMPERIAL AVE
 
1318
Other values (45011)
334489 

Length

Max length100
Median length10
Mean length10.5371782
Min length1

Characters and Unicode

Total characters3599479
Distinct characters80
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24391 ?
Unique (%)7.1%

Sample

1st rowGrand Avenue
2nd rowNOBEL DRIVE
3rd row59th Street
4th row59th Street
5th rowNIAGARA AVE

Common Values

ValueCountFrequency (%)
imperial1535
 
0.4%
El Cajon Blvd1516
 
0.4%
el cajon blvd1385
 
0.4%
imperial ave1355
 
0.4%
IMPERIAL AVE1318
 
0.4%
garnet1174
 
0.3%
university1100
 
0.3%
commercial1080
 
0.3%
Imperial Ave1067
 
0.3%
university ave1014
 
0.3%
Other values (45006)329054
92.4%
(Missing)14608
 
4.1%

Length

2021-09-13T06:58:35.825813image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ave41878
 
6.3%
st34275
 
5.1%
street32824
 
4.9%
blvd21740
 
3.3%
avenue15558
 
2.3%
rd11548
 
1.7%
mission10898
 
1.6%
dr10819
 
1.6%
imperial9415
 
1.4%
road8628
 
1.3%
Other values (9886)470444
70.4%

Most occurring characters

ValueCountFrequency (%)
348443
 
9.7%
e254798
 
7.1%
a217775
 
6.1%
t179297
 
5.0%
r173302
 
4.8%
A138706
 
3.9%
n137139
 
3.8%
o126115
 
3.5%
i124213
 
3.5%
l113883
 
3.2%
Other values (70)1785808
49.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1988862
55.3%
Uppercase Letter1161615
32.3%
Space Separator348443
 
9.7%
Decimal Number89892
 
2.5%
Other Punctuation8992
 
0.2%
Dash Punctuation1406
 
< 0.1%
Open Punctuation115
 
< 0.1%
Close Punctuation112
 
< 0.1%
Modifier Symbol40
 
< 0.1%
Math Symbol2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A138706
11.9%
E113500
 
9.8%
R97596
 
8.4%
S96225
 
8.3%
T74721
 
6.4%
N65942
 
5.7%
I65383
 
5.6%
O60555
 
5.2%
L60499
 
5.2%
D53771
 
4.6%
Other values (16)334717
28.8%
Lowercase Letter
ValueCountFrequency (%)
e254798
12.8%
a217775
10.9%
t179297
 
9.0%
r173302
 
8.7%
n137139
 
6.9%
o126115
 
6.3%
i124213
 
6.2%
l113883
 
5.7%
s112033
 
5.6%
v84913
 
4.3%
Other values (16)465394
23.4%
Other Punctuation
ValueCountFrequency (%)
.7333
81.6%
/1075
 
12.0%
#196
 
2.2%
&183
 
2.0%
'78
 
0.9%
@55
 
0.6%
,47
 
0.5%
:14
 
0.2%
;6
 
0.1%
"4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
118235
20.3%
512989
14.4%
412448
13.8%
39158
10.2%
09137
10.2%
68488
9.4%
76601
 
7.3%
26229
 
6.9%
83828
 
4.3%
92779
 
3.1%
Open Punctuation
ValueCountFrequency (%)
(113
98.3%
[2
 
1.7%
Space Separator
ValueCountFrequency (%)
348443
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1406
100.0%
Close Punctuation
ValueCountFrequency (%)
)112
100.0%
Modifier Symbol
ValueCountFrequency (%)
`40
100.0%
Math Symbol
ValueCountFrequency (%)
=2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3150477
87.5%
Common449002
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e254798
 
8.1%
a217775
 
6.9%
t179297
 
5.7%
r173302
 
5.5%
A138706
 
4.4%
n137139
 
4.4%
o126115
 
4.0%
i124213
 
3.9%
l113883
 
3.6%
E113500
 
3.6%
Other values (42)1571749
49.9%
Common
ValueCountFrequency (%)
348443
77.6%
118235
 
4.1%
512989
 
2.9%
412448
 
2.8%
39158
 
2.0%
09137
 
2.0%
68488
 
1.9%
.7333
 
1.6%
76601
 
1.5%
26229
 
1.4%
Other values (18)9941
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3599479
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
348443
 
9.7%
e254798
 
7.1%
a217775
 
6.1%
t179297
 
5.0%
r173302
 
4.8%
A138706
 
3.9%
n137139
 
3.8%
o126115
 
3.5%
i124213
 
3.5%
l113883
 
3.2%
Other values (70)1785808
49.6%

intersection
Categorical

HIGH CARDINALITY
MISSING

Distinct13841
Distinct (%)43.9%
Missing324647
Missing (%)91.1%
Memory size2.7 MiB
CAMINO DE LA PLAZA/ CAMIONES WAY
 
170
BROADWAY
 
138
G Street
 
131
w ash
 
130
I-15
 
112
Other values (13836)
30878 

Length

Max length77
Median length12
Mean length14.14087899
Min length1

Characters and Unicode

Total characters446272
Distinct characters78
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10011 ?
Unique (%)31.7%

Sample

1st rowI-5
2nd rowspace theater way
3rd rowtorrey pines rd
4th rowTOCAYO AVENUE
5th rowMission Blvd

Common Values

ValueCountFrequency (%)
CAMINO DE LA PLAZA/ CAMIONES WAY 170
 
< 0.1%
BROADWAY138
 
< 0.1%
G Street131
 
< 0.1%
w ash130
 
< 0.1%
I-15112
 
< 0.1%
imperial98
 
< 0.1%
torrey pines84
 
< 0.1%
80576
 
< 0.1%
MARKET74
 
< 0.1%
MISSION72
 
< 0.1%
Other values (13831)30474
 
8.6%
(Missing)324647
91.1%

Length

2021-09-13T06:58:36.183851image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
st2762
 
3.3%
and2544
 
3.1%
2509
 
3.0%
ave2505
 
3.0%
street1884
 
2.3%
beach1605
 
1.9%
mission1590
 
1.9%
rd1140
 
1.4%
blvd1096
 
1.3%
dr1030
 
1.2%
Other values (4245)63949
77.4%

Most occurring characters

ValueCountFrequency (%)
53395
 
12.0%
A22325
 
5.0%
a21963
 
4.9%
e20971
 
4.7%
E16811
 
3.8%
r15782
 
3.5%
n14017
 
3.1%
t13587
 
3.0%
R13388
 
3.0%
S13222
 
3.0%
Other values (68)240811
54.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter184104
41.3%
Uppercase Letter180758
40.5%
Space Separator53395
 
12.0%
Decimal Number17420
 
3.9%
Other Punctuation8826
 
2.0%
Dash Punctuation1760
 
0.4%
Open Punctuation4
 
< 0.1%
Close Punctuation4
 
< 0.1%
Math Symbol1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A22325
12.4%
E16811
 
9.3%
R13388
 
7.4%
S13222
 
7.3%
I13116
 
7.3%
N11591
 
6.4%
O10711
 
5.9%
T9025
 
5.0%
C8856
 
4.9%
L8428
 
4.7%
Other values (16)53285
29.5%
Lowercase Letter
ValueCountFrequency (%)
a21963
11.9%
e20971
11.4%
r15782
 
8.6%
n14017
 
7.6%
t13587
 
7.4%
o12365
 
6.7%
i11829
 
6.4%
s10729
 
5.8%
l9584
 
5.2%
d8853
 
4.8%
Other values (16)44424
24.1%
Other Punctuation
ValueCountFrequency (%)
/7826
88.7%
.477
 
5.4%
&173
 
2.0%
@160
 
1.8%
,113
 
1.3%
'65
 
0.7%
!5
 
0.1%
;2
 
< 0.1%
#2
 
< 0.1%
:2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
55246
30.1%
13215
18.5%
02231
12.8%
81751
 
10.1%
61056
 
6.1%
41053
 
6.0%
31027
 
5.9%
2879
 
5.0%
9540
 
3.1%
7422
 
2.4%
Dash Punctuation
ValueCountFrequency (%)
-1760
100.0%
Space Separator
ValueCountFrequency (%)
53395
100.0%
Math Symbol
ValueCountFrequency (%)
=1
100.0%
Open Punctuation
ValueCountFrequency (%)
(4
100.0%
Close Punctuation
ValueCountFrequency (%)
)4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin364862
81.8%
Common81410
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A22325
 
6.1%
a21963
 
6.0%
e20971
 
5.7%
E16811
 
4.6%
r15782
 
4.3%
n14017
 
3.8%
t13587
 
3.7%
R13388
 
3.7%
S13222
 
3.6%
I13116
 
3.6%
Other values (42)199680
54.7%
Common
ValueCountFrequency (%)
53395
65.6%
/7826
 
9.6%
55246
 
6.4%
13215
 
3.9%
02231
 
2.7%
-1760
 
2.2%
81751
 
2.2%
61056
 
1.3%
41053
 
1.3%
31027
 
1.3%
Other values (16)2850
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII446272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
53395
 
12.0%
A22325
 
5.0%
a21963
 
4.9%
e20971
 
4.7%
E16811
 
3.8%
r15782
 
3.5%
n14017
 
3.1%
t13587
 
3.0%
R13388
 
3.0%
S13222
 
3.0%
Other values (68)240811
54.0%

address_block
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct280
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5595.259204
Minimum0
Maximum99999900
Zeros34105
Zeros (%)9.6%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:36.361377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1800
median2700
Q34400
95-th percentile9200
Maximum99999900
Range99999900
Interquartile range (IQR)3600

Descriptive statistics

Standard deviation265547.7147
Coefficient of variation (CV)47.45941252
Kurtosis113168.6982
Mean5595.259204
Median Absolute Deviation (MAD)1800
Skewness307.2329112
Sum1993064900
Variance7.051558878 × 1010
MonotonicityNot monotonic
2021-09-13T06:58:36.519953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
034105
 
9.6%
10010861
 
3.0%
7008900
 
2.5%
30008391
 
2.4%
40007847
 
2.2%
10007466
 
2.1%
8007385
 
2.1%
5007311
 
2.1%
43006649
 
1.9%
42006601
 
1.9%
Other values (270)250690
70.4%
ValueCountFrequency (%)
034105
9.6%
10010861
 
3.0%
2006290
 
1.8%
3006067
 
1.7%
4005327
 
1.5%
5007311
 
2.1%
6005712
 
1.6%
7008900
 
2.5%
8007385
 
2.1%
9005737
 
1.6%
ValueCountFrequency (%)
999999002
 
< 0.1%
999990049
 
< 0.1%
80040001
 
< 0.1%
999900157
< 0.1%
5200001
 
< 0.1%
1330001
 
< 0.1%
1220002
 
< 0.1%
1217001
 
< 0.1%
1211001
 
< 0.1%
1201001
 
< 0.1%

landmark
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct37
Distinct (%)86.0%
Missing356163
Missing (%)> 99.9%
Memory size2.7 MiB
fiesta island bay
North Cove Park Pacific beach
 
2
Montgomery-Waller Recreation Center
 
2
:Fiesta Island Bay
 
2
5N at Genesee Ave
 
1
Other values (32)
32 

Length

Max length41
Median length17
Mean length19.27906977
Min length8

Characters and Unicode

Total characters829
Distinct characters59
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)76.7%

Sample

1st rowFiesta Island
2nd rowCrown Point
3rd rowfiesta island bay
4th rowfiesta island bay
5th row:Fiesta Island Bay

Common Values

ValueCountFrequency (%)
fiesta island bay4
 
< 0.1%
North Cove Park Pacific beach2
 
< 0.1%
Montgomery-Waller Recreation Center2
 
< 0.1%
:Fiesta Island Bay2
 
< 0.1%
5N at Genesee Ave1
 
< 0.1%
ON TROLLEY IN SANTEE1
 
< 0.1%
Ski beach1
 
< 0.1%
Fiesta Island Bay1
 
< 0.1%
sr905 / i8051
 
< 0.1%
BALBOA PARK - SPANISH VILLAGE1
 
< 0.1%
Other values (27)27
 
< 0.1%
(Missing)356163
> 99.9%

Length

2021-09-13T06:58:36.847078image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
8
 
5.3%
island8
 
5.3%
at8
 
5.3%
fiesta8
 
5.3%
bay7
 
4.6%
sb5
 
3.3%
park5
 
3.3%
drive4
 
2.6%
cove4
 
2.6%
i-154
 
2.6%
Other values (65)90
59.6%

Most occurring characters

ValueCountFrequency (%)
108
 
13.0%
a46
 
5.5%
e39
 
4.7%
E34
 
4.1%
A34
 
4.1%
R32
 
3.9%
I29
 
3.5%
T28
 
3.4%
t26
 
3.1%
i25
 
3.0%
Other values (49)428
51.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter325
39.2%
Uppercase Letter323
39.0%
Space Separator108
 
13.0%
Decimal Number50
 
6.0%
Dash Punctuation12
 
1.4%
Other Punctuation11
 
1.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E34
10.5%
A34
10.5%
R32
9.9%
I29
9.0%
T28
 
8.7%
S22
 
6.8%
N22
 
6.8%
O20
 
6.2%
B19
 
5.9%
D12
 
3.7%
Other values (12)71
22.0%
Lowercase Letter
ValueCountFrequency (%)
a46
14.2%
e39
12.0%
t26
 
8.0%
i25
 
7.7%
n22
 
6.8%
r22
 
6.8%
s21
 
6.5%
o21
 
6.5%
l16
 
4.9%
d14
 
4.3%
Other values (12)73
22.5%
Decimal Number
ValueCountFrequency (%)
517
34.0%
110
20.0%
95
 
10.0%
04
 
8.0%
83
 
6.0%
63
 
6.0%
43
 
6.0%
32
 
4.0%
22
 
4.0%
71
 
2.0%
Other Punctuation
ValueCountFrequency (%)
/5
45.5%
@4
36.4%
:2
 
18.2%
Space Separator
ValueCountFrequency (%)
108
100.0%
Dash Punctuation
ValueCountFrequency (%)
-12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin648
78.2%
Common181
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a46
 
7.1%
e39
 
6.0%
E34
 
5.2%
A34
 
5.2%
R32
 
4.9%
I29
 
4.5%
T28
 
4.3%
t26
 
4.0%
i25
 
3.9%
n22
 
3.4%
Other values (34)333
51.4%
Common
ValueCountFrequency (%)
108
59.7%
517
 
9.4%
-12
 
6.6%
110
 
5.5%
/5
 
2.8%
95
 
2.8%
04
 
2.2%
@4
 
2.2%
83
 
1.7%
63
 
1.7%
Other values (5)10
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII829
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
108
 
13.0%
a46
 
5.5%
e39
 
4.7%
E34
 
4.1%
A34
 
4.1%
R32
 
3.9%
I29
 
3.5%
T28
 
3.4%
t26
 
3.1%
i25
 
3.0%
Other values (49)428
51.6%

is_school
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
355829 
1
 
377

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

Length

2021-09-13T06:58:37.120347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:37.202129image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0355829
99.9%
1377
 
0.1%

school_name
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct99
Distinct (%)26.3%
Missing355829
Missing (%)99.9%
Memory size2.7 MiB
Ibarra Elementary (San Diego Unified) 37683380108290
38 
Rancho Bernardo High (Poway Unified) 37682963730819
 
24
Del Norte High (Poway Unified) 37682960118935
 
17
Montgomery Senior High (Sweetwater Union High) 37684113738234
 
17
Torrey Pines High (San Dieguito Union High) 37683463730033
 
16
Other values (94)
265 

Length

Max length69
Median length52
Mean length53.08488064
Min length35

Characters and Unicode

Total characters20013
Distinct characters65
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)10.3%

Sample

1st rowRancho Bernardo High (Poway Unified) 37682963730819
2nd rowRancho Bernardo High (Poway Unified) 37682963730819
3rd rowCherokee Point Elementary (San Diego Unified) 37683380108282
4th rowRancho Bernardo High (Poway Unified) 37682963730819
5th rowRancho Bernardo High (Poway Unified) 37682963730819

Common Values

ValueCountFrequency (%)
Ibarra Elementary (San Diego Unified) 3768338010829038
 
< 0.1%
Rancho Bernardo High (Poway Unified) 3768296373081924
 
< 0.1%
Del Norte High (Poway Unified) 3768296011893517
 
< 0.1%
Montgomery Senior High (Sweetwater Union High) 3768411373823417
 
< 0.1%
Torrey Pines High (San Dieguito Union High) 3768346373003316
 
< 0.1%
The O'Farrell Charter (San Diego Unified) 3768338606196414
 
< 0.1%
San Ysidro Middle (San Ysidro Elementary) 3768379609845314
 
< 0.1%
Canyon Crest Academy (San Dieguito Union High) 3768346010632810
 
< 0.1%
San Ysidro High (Sweetwater Union High) 3768411373150210
 
< 0.1%
Mt. Carmel High (Poway Unified) 376829637300749
 
< 0.1%
Other values (89)208
 
0.1%
(Missing)355829
99.9%

Length

2021-09-13T06:58:37.486368image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san256
 
10.6%
unified253
 
10.5%
high201
 
8.3%
diego166
 
6.9%
elementary155
 
6.4%
poway87
 
3.6%
union82
 
3.4%
middle68
 
2.8%
ysidro55
 
2.3%
sweetwater43
 
1.8%
Other values (236)1053
43.5%

Most occurring characters

ValueCountFrequency (%)
2048
 
10.2%
e1351
 
6.8%
i1330
 
6.6%
31201
 
6.0%
n1086
 
5.4%
a949
 
4.7%
6789
 
3.9%
8747
 
3.7%
o712
 
3.6%
r707
 
3.5%
Other values (55)9093
45.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9795
48.9%
Decimal Number5289
26.4%
Uppercase Letter2085
 
10.4%
Space Separator2048
 
10.2%
Open Punctuation377
 
1.9%
Close Punctuation377
 
1.9%
Other Punctuation26
 
0.1%
Dash Punctuation16
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S372
17.8%
U337
16.2%
D230
11.0%
H224
10.7%
E164
7.9%
P136
 
6.5%
M132
 
6.3%
C84
 
4.0%
B60
 
2.9%
Y55
 
2.6%
Other values (14)291
14.0%
Lowercase Letter
ValueCountFrequency (%)
e1351
13.8%
i1330
13.6%
n1086
11.1%
a949
9.7%
o712
 
7.3%
r707
 
7.2%
d531
 
5.4%
t469
 
4.8%
g448
 
4.6%
l396
 
4.0%
Other values (14)1816
18.5%
Decimal Number
ValueCountFrequency (%)
31201
22.7%
6789
14.9%
8747
14.1%
7621
11.7%
0594
11.2%
1403
 
7.6%
9364
 
6.9%
2250
 
4.7%
4187
 
3.5%
5133
 
2.5%
Other Punctuation
ValueCountFrequency (%)
'14
53.8%
.10
38.5%
/2
 
7.7%
Space Separator
ValueCountFrequency (%)
2048
100.0%
Open Punctuation
ValueCountFrequency (%)
(377
100.0%
Close Punctuation
ValueCountFrequency (%)
)377
100.0%
Dash Punctuation
ValueCountFrequency (%)
-16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11880
59.4%
Common8133
40.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1351
 
11.4%
i1330
 
11.2%
n1086
 
9.1%
a949
 
8.0%
o712
 
6.0%
r707
 
6.0%
d531
 
4.5%
t469
 
3.9%
g448
 
3.8%
l396
 
3.3%
Other values (38)3901
32.8%
Common
ValueCountFrequency (%)
2048
25.2%
31201
14.8%
6789
 
9.7%
8747
 
9.2%
7621
 
7.6%
0594
 
7.3%
1403
 
5.0%
(377
 
4.6%
)377
 
4.6%
9364
 
4.5%
Other values (7)612
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII20013
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2048
 
10.2%
e1351
 
6.8%
i1330
 
6.6%
31201
 
6.0%
n1086
 
5.4%
a949
 
4.7%
6789
 
3.9%
8747
 
3.7%
o712
 
3.6%
r707
 
3.5%
Other values (55)9093
45.4%

ori
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
CA0371100
356206 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters3205854
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCA0371100
2nd rowCA0371100
3rd rowCA0371100
4th rowCA0371100
5th rowCA0371100

Common Values

ValueCountFrequency (%)
CA0371100356206
100.0%

Length

2021-09-13T06:58:37.751662image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:37.832443image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
ca0371100356206
100.0%

Most occurring characters

ValueCountFrequency (%)
01068618
33.3%
1712412
22.2%
C356206
 
11.1%
A356206
 
11.1%
3356206
 
11.1%
7356206
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2493442
77.8%
Uppercase Letter712412
 
22.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01068618
42.9%
1712412
28.6%
3356206
 
14.3%
7356206
 
14.3%
Uppercase Letter
ValueCountFrequency (%)
C356206
50.0%
A356206
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common2493442
77.8%
Latin712412
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
01068618
42.9%
1712412
28.6%
3356206
 
14.3%
7356206
 
14.3%
Latin
ValueCountFrequency (%)
C356206
50.0%
A356206
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3205854
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01068618
33.3%
1712412
22.2%
C356206
 
11.1%
A356206
 
11.1%
3356206
 
11.1%
7356206
 
11.1%

agency
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
SD
356206 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters712412
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSD
2nd rowSD
3rd rowSD
4th rowSD
5th rowSD

Common Values

ValueCountFrequency (%)
SD356206
100.0%

Length

2021-09-13T06:58:38.025926image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:38.109703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
sd356206
100.0%

Most occurring characters

ValueCountFrequency (%)
S356206
50.0%
D356206
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter712412
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S356206
50.0%
D356206
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin712412
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S356206
50.0%
D356206
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII712412
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S356206
50.0%
D356206
50.0%

officer_assignment_key
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.520286014
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:38.174528image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.973837466
Coefficient of variation (CV)1.298332977
Kurtosis12.73052062
Mean1.520286014
Median Absolute Deviation (MAD)0
Skewness3.790244382
Sum541535
Variance3.896034343
MonotonicityNot monotonic
2021-09-13T06:58:38.270278image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1326018
91.5%
1013466
 
3.8%
27745
 
2.2%
93907
 
1.1%
71933
 
0.5%
6766
 
0.2%
4749
 
0.2%
5705
 
0.2%
8561
 
0.2%
3356
 
0.1%
ValueCountFrequency (%)
1326018
91.5%
27745
 
2.2%
3356
 
0.1%
4749
 
0.2%
5705
 
0.2%
6766
 
0.2%
71933
 
0.5%
8561
 
0.2%
93907
 
1.1%
1013466
 
3.8%
ValueCountFrequency (%)
1013466
 
3.8%
93907
 
1.1%
8561
 
0.2%
71933
 
0.5%
6766
 
0.2%
5705
 
0.2%
4749
 
0.2%
3356
 
0.1%
27745
 
2.2%
1326018
91.5%

assignment
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Patrol, traffic enforcement, field operations
326018 
Other
 
13466
Gang enforcement
 
7745
Investigative/detective
 
3907
Task force
 
1933
Other values (5)
 
3137

Length

Max length78
Median length45
Mean length42.29938575
Min length5

Characters and Unicode

Total characters15067295
Distinct characters39
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPatrol, traffic enforcement, field operations
2nd rowPatrol, traffic enforcement, field operations
3rd rowOther
4th rowOther
5th rowPatrol, traffic enforcement, field operations

Common Values

ValueCountFrequency (%)
Patrol, traffic enforcement, field operations326018
91.5%
Other13466
 
3.8%
Gang enforcement7745
 
2.2%
Investigative/detective3907
 
1.1%
Task force1933
 
0.5%
Narcotics/vice766
 
0.2%
Special events749
 
0.2%
Roadblock or DUI sobriety checkpoint705
 
0.2%
K1-12 public school inlcuding school resource officer or school police officer561
 
0.2%
Compliance check356
 
0.1%

Length

2021-09-13T06:58:38.529579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:38.636293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enforcement333763
19.9%
traffic326018
19.4%
patrol326018
19.4%
field326018
19.4%
operations326018
19.4%
other13466
 
0.8%
gang7745
 
0.5%
investigative/detective3907
 
0.2%
force1933
 
0.1%
task1933
 
0.1%
Other values (17)12672
 
0.8%

Most occurring characters

ValueCountFrequency (%)
e1696199
11.3%
t1343836
8.9%
r1332197
8.8%
o1324568
8.8%
1323285
8.8%
f1315994
8.7%
n1008128
 
6.7%
i997188
 
6.6%
a994215
 
6.6%
c676900
 
4.5%
Other values (29)3054785
20.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12726736
84.5%
Space Separator1323285
 
8.8%
Other Punctuation656709
 
4.4%
Uppercase Letter358321
 
2.4%
Decimal Number1683
 
< 0.1%
Dash Punctuation561
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1696199
13.3%
t1343836
10.6%
r1332197
10.5%
o1324568
10.4%
f1315994
10.3%
n1008128
7.9%
i997188
7.8%
a994215
7.8%
c676900
 
5.3%
l657212
 
5.2%
Other values (11)1380299
10.8%
Uppercase Letter
ValueCountFrequency (%)
P326018
91.0%
O13466
 
3.8%
G7745
 
2.2%
I4612
 
1.3%
T1933
 
0.5%
N766
 
0.2%
S749
 
0.2%
R705
 
0.2%
D705
 
0.2%
U705
 
0.2%
Other values (2)917
 
0.3%
Other Punctuation
ValueCountFrequency (%)
,652036
99.3%
/4673
 
0.7%
Decimal Number
ValueCountFrequency (%)
11122
66.7%
2561
33.3%
Space Separator
ValueCountFrequency (%)
1323285
100.0%
Dash Punctuation
ValueCountFrequency (%)
-561
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13085057
86.8%
Common1982238
 
13.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1696199
13.0%
t1343836
10.3%
r1332197
10.2%
o1324568
10.1%
f1315994
10.1%
n1008128
7.7%
i997188
7.6%
a994215
7.6%
c676900
 
5.2%
l657212
 
5.0%
Other values (23)1738620
13.3%
Common
ValueCountFrequency (%)
1323285
66.8%
,652036
32.9%
/4673
 
0.2%
11122
 
0.1%
-561
 
< 0.1%
2561
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII15067295
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1696199
11.3%
t1343836
8.9%
r1332197
8.8%
o1324568
8.8%
1323285
8.8%
f1315994
8.7%
n1008128
 
6.7%
i997188
 
6.6%
a994215
 
6.6%
c676900
 
4.5%
Other values (29)3054785
20.3%

exp_years
Real number (ℝ≥0)

HIGH CORRELATION

Distinct41
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.805994284
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:39.100053image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median4
Q310
95-th percentile23
Maximum50
Range49
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.39668262
Coefficient of variation (CV)1.086789426
Kurtosis1.820133028
Mean6.805994284
Median Absolute Deviation (MAD)3
Skewness1.522281549
Sum2424336
Variance54.71091377
MonotonicityNot monotonic
2021-09-13T06:58:39.243669image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
1109137
30.6%
333755
 
9.5%
231693
 
8.9%
527239
 
7.6%
426442
 
7.4%
1015890
 
4.5%
911058
 
3.1%
1810613
 
3.0%
119836
 
2.8%
68625
 
2.4%
Other values (31)71918
20.2%
ValueCountFrequency (%)
1109137
30.6%
231693
 
8.9%
333755
 
9.5%
426442
 
7.4%
527239
 
7.6%
68625
 
2.4%
73831
 
1.1%
84681
 
1.3%
911058
 
3.1%
1015890
 
4.5%
ValueCountFrequency (%)
506
 
< 0.1%
4914
 
< 0.1%
48222
0.1%
4536
 
< 0.1%
372
 
< 0.1%
362
 
< 0.1%
351
 
< 0.1%
342
 
< 0.1%
3318
 
< 0.1%
32310
0.1%

pid
Real number (ℝ≥0)

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.253912062
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:39.387285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum52
Range51
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.231543519
Coefficient of variation (CV)0.9821609952
Kurtosis369.2604304
Mean1.253912062
Median Absolute Deviation (MAD)0
Skewness15.60945713
Sum446651
Variance1.516699439
MonotonicityNot monotonic
2021-09-13T06:58:39.535888image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1309922
87.0%
230762
 
8.6%
38275
 
2.3%
43142
 
0.9%
51379
 
0.4%
6739
 
0.2%
7411
 
0.1%
8276
 
0.1%
9194
 
0.1%
10156
 
< 0.1%
Other values (42)950
 
0.3%
ValueCountFrequency (%)
1309922
87.0%
230762
 
8.6%
38275
 
2.3%
43142
 
0.9%
51379
 
0.4%
6739
 
0.2%
7411
 
0.1%
8276
 
0.1%
9194
 
0.1%
10156
 
< 0.1%
ValueCountFrequency (%)
521
 
< 0.1%
511
 
< 0.1%
501
 
< 0.1%
491
 
< 0.1%
482
< 0.1%
472
< 0.1%
463
< 0.1%
453
< 0.1%
443
< 0.1%
433
< 0.1%

is_student
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
355982 
1
 
224

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

Length

2021-09-13T06:58:39.813147image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:39.896923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0355982
99.9%
1224
 
0.1%

perceived_limited_english
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
347456 
1
 
8750

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

Length

2021-09-13T06:58:40.114315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:40.193104image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

Most occurring characters

ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0347456
97.5%
18750
 
2.5%

perceived_age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct109
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.39630439
Minimum1
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:40.285856image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q126
median35
Q348
95-th percentile60
Maximum120
Range119
Interquartile range (IQR)22

Descriptive statistics

Standard deviation13.46826227
Coefficient of variation (CV)0.3601495519
Kurtosis-0.233981078
Mean37.39630439
Median Absolute Deviation (MAD)10
Skewness0.573057818
Sum13320788
Variance181.3940885
MonotonicityNot monotonic
2021-09-13T06:58:40.427477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3050286
14.1%
4036949
 
10.4%
2536237
 
10.2%
5032814
 
9.2%
3528417
 
8.0%
4521130
 
5.9%
6018828
 
5.3%
2017182
 
4.8%
5514046
 
3.9%
216171
 
1.7%
Other values (99)94146
26.4%
ValueCountFrequency (%)
110
 
< 0.1%
28
 
< 0.1%
34
 
< 0.1%
413
 
< 0.1%
554
 
< 0.1%
623
 
< 0.1%
736
 
< 0.1%
864
 
< 0.1%
935
 
< 0.1%
10275
0.1%
ValueCountFrequency (%)
1208
< 0.1%
1181
 
< 0.1%
1162
 
< 0.1%
1151
 
< 0.1%
1101
 
< 0.1%
1051
 
< 0.1%
1031
 
< 0.1%
1021
 
< 0.1%
1011
 
< 0.1%
10018
< 0.1%

gender2
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
1
259302 
2
95816 
3
 
569
4
 
415
0
 
104

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

Length

2021-09-13T06:58:40.770587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:40.883300image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1259302
72.8%
295816
 
26.9%
3569
 
0.2%
4415
 
0.1%
0104
 
< 0.1%

perceived_gender
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing104
Missing (%)< 0.1%
Memory size2.7 MiB
Male
259302 
Female
95816 
Transgender man/boy
 
569
Transgender woman/girl
 
415

Length

Max length22
Median length4
Mean length4.583082937
Min length4

Characters and Unicode

Total characters1632045
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Male259302
72.8%
Female95816
 
26.9%
Transgender man/boy569
 
0.2%
Transgender woman/girl415
 
0.1%
(Missing)104
 
< 0.1%

Length

2021-09-13T06:58:41.336214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:41.462878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
male259302
72.6%
female95816
 
26.8%
transgender984
 
0.3%
man/boy569
 
0.2%
woman/girl415
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e452902
27.8%
a357086
21.9%
l355533
21.8%
M259302
15.9%
m96800
 
5.9%
F95816
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (9)6888
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1273975
78.1%
Uppercase Letter356102
 
21.8%
Space Separator984
 
0.1%
Other Punctuation984
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e452902
35.6%
a357086
28.0%
l355533
27.9%
m96800
 
7.6%
n2952
 
0.2%
r2383
 
0.2%
g1399
 
0.1%
s984
 
0.1%
d984
 
0.1%
o984
 
0.1%
Other values (4)1968
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
M259302
72.8%
F95816
 
26.9%
T984
 
0.3%
Space Separator
ValueCountFrequency (%)
984
100.0%
Other Punctuation
ValueCountFrequency (%)
/984
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1630077
99.9%
Common1968
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e452902
27.8%
a357086
21.9%
l355533
21.8%
M259302
15.9%
m96800
 
5.9%
F95816
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (7)4920
 
0.3%
Common
ValueCountFrequency (%)
984
50.0%
/984
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1632045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e452902
27.8%
a357086
21.9%
l355533
21.8%
M259302
15.9%
m96800
 
5.9%
F95816
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (9)6888
 
0.4%

gender_nc
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
356029 
5
 
177

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

Length

2021-09-13T06:58:41.735143image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:41.823906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0356029
> 99.9%
5177
 
< 0.1%

gender_non_conforming
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
356029 
1
 
177

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters356206
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

Length

2021-09-13T06:58:42.039330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:42.138066image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number356206
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common356206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII356206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0356029
> 99.9%
1177
 
< 0.1%

gender
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing104
Missing (%)< 0.1%
Memory size2.7 MiB
Male
259300 
Female
95818 
Transgender man/boy
 
569
Transgender woman/girl
 
415

Length

Max length22
Median length4
Mean length4.58309417
Min length4

Characters and Unicode

Total characters1632049
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Male259300
72.8%
Female95818
 
26.9%
Transgender man/boy569
 
0.2%
Transgender woman/girl415
 
0.1%
(Missing)104
 
< 0.1%

Length

2021-09-13T06:58:42.387424image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:42.510107image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
male259300
72.6%
female95818
 
26.8%
transgender984
 
0.3%
man/boy569
 
0.2%
woman/girl415
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e452904
27.8%
a357086
21.9%
l355533
21.8%
M259300
15.9%
m96802
 
5.9%
F95818
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (9)6888
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1273979
78.1%
Uppercase Letter356102
 
21.8%
Space Separator984
 
0.1%
Other Punctuation984
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e452904
35.6%
a357086
28.0%
l355533
27.9%
m96802
 
7.6%
n2952
 
0.2%
r2383
 
0.2%
g1399
 
0.1%
s984
 
0.1%
d984
 
0.1%
o984
 
0.1%
Other values (4)1968
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
M259300
72.8%
F95818
 
26.9%
T984
 
0.3%
Space Separator
ValueCountFrequency (%)
984
100.0%
Other Punctuation
ValueCountFrequency (%)
/984
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1630081
99.9%
Common1968
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e452904
27.8%
a357086
21.9%
l355533
21.8%
M259300
15.9%
m96802
 
5.9%
F95818
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (7)4920
 
0.3%
Common
ValueCountFrequency (%)
984
50.0%
/984
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1632049
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e452904
27.8%
a357086
21.9%
l355533
21.8%
M259300
15.9%
m96802
 
5.9%
F95818
 
5.9%
n2952
 
0.2%
r2383
 
0.1%
g1399
 
0.1%
T984
 
0.1%
Other values (9)6888
 
0.4%

perceived_lgbt
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size348.0 KiB
False
346430 
True
 
9776
ValueCountFrequency (%)
False346430
97.3%
True9776
 
2.7%
2021-09-13T06:58:42.618813image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

race
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
White
152082 
Hispanic/Latino/a
102862 
Black/African American
71069 
Asian
17052 
Middle Eastern or South Asian
 
9415
Other values (2)
 
3726

Length

Max length29
Median length17
Mean length12.60421217
Min length5

Characters and Unicode

Total characters4489696
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite
2nd rowWhite
3rd rowHispanic/Latino/a
4th rowHispanic/Latino/a
5th rowWhite

Common Values

ValueCountFrequency (%)
White152082
42.7%
Hispanic/Latino/a102862
28.9%
Black/African American71069
20.0%
Asian17052
 
4.8%
Middle Eastern or South Asian9415
 
2.6%
Pacific Islander2929
 
0.8%
Native American797
 
0.2%

Length

2021-09-13T06:58:42.910033image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:43.011766image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
white152082
32.5%
hispanic/latino/a102862
21.9%
american71866
15.3%
black/african71069
15.2%
asian26467
 
5.6%
south9415
 
2.0%
middle9415
 
2.0%
or9415
 
2.0%
eastern9415
 
2.0%
islander2929
 
0.6%
Other values (2)3726
 
0.8%

Most occurring characters

ValueCountFrequency (%)
i646140
14.4%
a565127
12.6%
n387470
 
8.6%
c322724
 
7.2%
/276793
 
6.2%
t274571
 
6.1%
e246504
 
5.5%
A169402
 
3.8%
r164694
 
3.7%
h161497
 
3.6%
Other values (21)1274774
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3467271
77.2%
Uppercase Letter633177
 
14.1%
Other Punctuation276793
 
6.2%
Space Separator112455
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i646140
18.6%
a565127
16.3%
n387470
11.2%
c322724
9.3%
t274571
7.9%
e246504
 
7.1%
r164694
 
4.7%
h161497
 
4.7%
s141673
 
4.1%
o121692
 
3.5%
Other values (8)435179
12.6%
Uppercase Letter
ValueCountFrequency (%)
A169402
26.8%
W152082
24.0%
H102862
16.2%
L102862
16.2%
B71069
11.2%
M9415
 
1.5%
E9415
 
1.5%
S9415
 
1.5%
P2929
 
0.5%
I2929
 
0.5%
Other Punctuation
ValueCountFrequency (%)
/276793
100.0%
Space Separator
ValueCountFrequency (%)
112455
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4100448
91.3%
Common389248
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i646140
15.8%
a565127
13.8%
n387470
 
9.4%
c322724
 
7.9%
t274571
 
6.7%
e246504
 
6.0%
A169402
 
4.1%
r164694
 
4.0%
h161497
 
3.9%
W152082
 
3.7%
Other values (19)1010237
24.6%
Common
ValueCountFrequency (%)
/276793
71.1%
112455
28.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4489696
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i646140
14.4%
a565127
12.6%
n387470
 
8.6%
c322724
 
7.2%
/276793
 
6.2%
t274571
 
6.1%
e246504
 
5.5%
A169402
 
3.8%
r164694
 
3.7%
h161497
 
3.6%
Other values (21)1274774
28.4%

disability
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
None
339692 
Mental health condition
 
11949
Other disability
 
2022
Intellectual or developmental disability, including dementia
 
944
Speech impairment or limited use of language
 
710
Other values (3)
 
889

Length

Max length60
Median length4
Mean length4.992495915
Min length4

Characters and Unicode

Total characters1778357
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None339692
95.4%
Mental health condition11949
 
3.4%
Other disability2022
 
0.6%
Intellectual or developmental disability, including dementia944
 
0.3%
Speech impairment or limited use of language710
 
0.2%
Deafness or difficulty hearing559
 
0.2%
Blind or limited vision325
 
0.1%
Disability related to hyperactivity or impulsive behavior5
 
< 0.1%

Length

2021-09-13T06:58:43.420663image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:43.518400image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
none339692
86.3%
condition11949
 
3.0%
health11949
 
3.0%
mental11949
 
3.0%
disability2971
 
0.8%
or2543
 
0.6%
other2022
 
0.5%
limited1035
 
0.3%
intellectual944
 
0.2%
dementia944
 
0.2%
Other values (17)7790
 
2.0%

Most occurring characters

ValueCountFrequency (%)
n383447
21.6%
e378507
21.3%
o368122
20.7%
N339692
19.1%
t46940
 
2.6%
i41810
 
2.4%
37582
 
2.1%
l35172
 
2.0%
a32964
 
1.9%
h27199
 
1.5%
Other values (19)86922
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1383625
77.8%
Uppercase Letter356206
 
20.0%
Space Separator37582
 
2.1%
Other Punctuation944
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n383447
27.7%
e378507
27.4%
o368122
26.6%
t46940
 
3.4%
i41810
 
3.0%
l35172
 
2.5%
a32964
 
2.4%
h27199
 
2.0%
d19671
 
1.4%
c15111
 
1.1%
Other values (10)34682
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
N339692
95.4%
M11949
 
3.4%
O2022
 
0.6%
I944
 
0.3%
S710
 
0.2%
D564
 
0.2%
B325
 
0.1%
Space Separator
ValueCountFrequency (%)
37582
100.0%
Other Punctuation
ValueCountFrequency (%)
,944
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1739831
97.8%
Common38526
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n383447
22.0%
e378507
21.8%
o368122
21.2%
N339692
19.5%
t46940
 
2.7%
i41810
 
2.4%
l35172
 
2.0%
a32964
 
1.9%
h27199
 
1.6%
d19671
 
1.1%
Other values (17)66307
 
3.8%
Common
ValueCountFrequency (%)
37582
97.5%
,944
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1778357
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n383447
21.6%
e378507
21.3%
o368122
20.7%
N339692
19.1%
t46940
 
2.6%
i41810
 
2.4%
37582
 
2.1%
l35172
 
2.0%
a32964
 
1.9%
h27199
 
1.5%
Other values (19)86922
 
4.9%

reason_for_stop_code
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size2.7 MiB

reason_for_stop_code_text
Categorical

HIGH CARDINALITY
MISSING

Distinct1566
Distinct (%)0.5%
Missing20564
Missing (%)5.8%
Memory size2.7 MiB
65002 ZZ - LOCAL ORDINANCE VIOL (I) 65002
 
23635
647(E) PC - DIS CON:LODGE W/O CONSENT (M) 32111
 
16917
602 PC - TRESPASSING (M) 32022
 
16078
65000 ZZ - LOCAL ORDINANCE VIOL (M) 65000
 
13955
22350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106
 
13546
Other values (1561)
251511 

Length

Max length67
Median length46
Mean length44.12087581
Min length3

Characters and Unicode

Total characters14808819
Distinct characters67
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique409 ?
Unique (%)0.1%

Sample

1st row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
2nd row22350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106
3rd row415(1) PC - FIGHT IN PUBLIC PLACE (M) 53072
4th row415(1) PC - FIGHT IN PUBLIC PLACE (M) 53072
5th row22350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106

Common Values

ValueCountFrequency (%)
65002 ZZ - LOCAL ORDINANCE VIOL (I) 6500223635
 
6.6%
647(E) PC - DIS CON:LODGE W/O CONSENT (M) 3211116917
 
4.7%
602 PC - TRESPASSING (M) 3202216078
 
4.5%
65000 ZZ - LOCAL ORDINANCE VIOL (M) 6500013955
 
3.9%
22350 VC - UNSAFE SPEED:PREVAIL COND (I) 5410613546
 
3.8%
22450(A) VC - FAIL STOP VEH:XWALK/ETC (I) 5416712563
 
3.5%
NA - XX ZZ - COMMUNITY CARETAKING (X) 9999011891
 
3.3%
23123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 546559038
 
2.5%
647(F) PC - DISORD CONDUCT:ALCOHOL (M) 640058907
 
2.5%
21461(A) VC - DRIVER FAIL OBEY SIGN/ETC (I) 541468463
 
2.4%
Other values (1556)200649
56.3%
(Missing)20564
 
5.8%

Length

2021-09-13T06:58:43.900353image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
352617
 
12.7%
i186180
 
6.7%
vc155523
 
5.6%
m114251
 
4.1%
pc104115
 
3.8%
zz53711
 
1.9%
6500247270
 
1.7%
viol45233
 
1.6%
fail44851
 
1.6%
ordinance37590
 
1.4%
Other values (5434)1625176
58.7%

Most occurring characters

ValueCountFrequency (%)
2430875
 
16.4%
I699463
 
4.7%
E668412
 
4.5%
C646125
 
4.4%
A571519
 
3.9%
(547835
 
3.7%
)547791
 
3.7%
O543233
 
3.7%
0524036
 
3.5%
2486597
 
3.3%
Other values (57)7142933
48.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7514356
50.7%
Decimal Number3118287
21.1%
Space Separator2430875
 
16.4%
Open Punctuation547835
 
3.7%
Close Punctuation547791
 
3.7%
Dash Punctuation353078
 
2.4%
Other Punctuation295705
 
2.0%
Currency Symbol676
 
< 0.1%
Math Symbol171
 
< 0.1%
Lowercase Letter45
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I699463
 
9.3%
E668412
 
8.9%
C646125
 
8.6%
A571519
 
7.6%
O543233
 
7.2%
L470461
 
6.3%
N470388
 
6.3%
T383542
 
5.1%
S380058
 
5.1%
P363154
 
4.8%
Other values (16)2318001
30.8%
Lowercase Letter
ValueCountFrequency (%)
i7
15.6%
t5
11.1%
e4
 
8.9%
l4
 
8.9%
r3
 
6.7%
u3
 
6.7%
d3
 
6.7%
s2
 
4.4%
a2
 
4.4%
o2
 
4.4%
Other values (8)10
22.2%
Decimal Number
ValueCountFrequency (%)
0524036
16.8%
2486597
15.6%
5483380
15.5%
4400380
12.8%
1392265
12.6%
6297787
9.5%
3238523
7.6%
9130885
 
4.2%
7105416
 
3.4%
859018
 
1.9%
Other Punctuation
ValueCountFrequency (%)
/149979
50.7%
:119152
40.3%
.23237
 
7.9%
&3202
 
1.1%
'85
 
< 0.1%
"48
 
< 0.1%
,2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(547835
100.0%
Close Punctuation
ValueCountFrequency (%)
)547791
100.0%
Space Separator
ValueCountFrequency (%)
2430875
100.0%
Dash Punctuation
ValueCountFrequency (%)
-353078
100.0%
Math Symbol
ValueCountFrequency (%)
+171
100.0%
Currency Symbol
ValueCountFrequency (%)
$676
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7514401
50.7%
Common7294418
49.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
I699463
 
9.3%
E668412
 
8.9%
C646125
 
8.6%
A571519
 
7.6%
O543233
 
7.2%
L470461
 
6.3%
N470388
 
6.3%
T383542
 
5.1%
S380058
 
5.1%
P363154
 
4.8%
Other values (34)2318046
30.8%
Common
ValueCountFrequency (%)
2430875
33.3%
(547835
 
7.5%
)547791
 
7.5%
0524036
 
7.2%
2486597
 
6.7%
5483380
 
6.6%
4400380
 
5.5%
1392265
 
5.4%
-353078
 
4.8%
6297787
 
4.1%
Other values (13)830394
 
11.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII14808819
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2430875
 
16.4%
I699463
 
4.7%
E668412
 
4.5%
C646125
 
4.4%
A571519
 
3.9%
(547835
 
3.7%
)547791
 
3.7%
O543233
 
3.7%
0524036
 
3.5%
2486597
 
3.3%
Other values (57)7142933
48.2%

reason_for_stop
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Reasonable Suspicion
186786 
Traffic Violation
148855 
Consensual Encounter resulting in a search
 
5920
Investigation to determine whether the person was truant
 
5751
Known to be on Parole / Probation / PRCS / Mandatory Supervision
 
5095
Other values (5)
 
3799

Length

Max length100
Median length20
Mean length20.67503916
Min length7

Characters and Unicode

Total characters7364573
Distinct characters44
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowReasonable Suspicion
2nd rowTraffic Violation
3rd rowReasonable Suspicion
4th rowReasonable Suspicion
5th rowTraffic Violation

Common Values

ValueCountFrequency (%)
Reasonable Suspicion186786
52.4%
Traffic Violation148855
41.8%
Consensual Encounter resulting in a search5920
 
1.7%
Investigation to determine whether the person was truant5751
 
1.6%
Known to be on Parole / Probation / PRCS / Mandatory Supervision5095
 
1.4%
Knowledge of outstanding arrest warrant/wanted person3764
 
1.1%
Determine whether the student violated school policy27
 
< 0.1%
Possible conduct warranting discipline under Education Code sections 48900, 48900.2, 48900.3, 48900.6
 
< 0.1%
N,None"1
 
< 0.1%
N,Person removed from vehicle by order"1
 
< 0.1%

Length

2021-09-13T06:58:44.243483image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:44.363116image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
suspicion186786
22.3%
reasonable186786
22.3%
traffic148855
17.8%
violation148855
17.8%
15285
 
1.8%
to10846
 
1.3%
person9515
 
1.1%
resulting5920
 
0.7%
search5920
 
0.7%
a5920
 
0.7%
Other values (44)112114
13.4%

Most occurring characters

ValueCountFrequency (%)
i868403
11.8%
o756334
 
10.3%
a745540
 
10.1%
n639954
 
8.7%
480596
 
6.5%
e477618
 
6.5%
s430977
 
5.9%
l356434
 
4.8%
c347566
 
4.7%
f301475
 
4.1%
Other values (34)1959676
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6126215
83.2%
Uppercase Letter738541
 
10.0%
Space Separator480596
 
6.5%
Other Punctuation19089
 
0.3%
Decimal Number132
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i868403
14.2%
o756334
12.3%
a745540
12.2%
n639954
10.4%
e477618
7.8%
s430977
7.0%
l356434
 
5.8%
c347566
 
5.7%
f301475
 
4.9%
t240994
 
3.9%
Other values (11)960920
15.7%
Uppercase Letter
ValueCountFrequency (%)
S196976
26.7%
R191881
26.0%
T148855
20.2%
V148855
20.2%
P15292
 
2.1%
C11021
 
1.5%
K8859
 
1.2%
E5926
 
0.8%
I5751
 
0.8%
M5095
 
0.7%
Other values (2)30
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
048
36.4%
424
18.2%
824
18.2%
924
18.2%
26
 
4.5%
36
 
4.5%
Other Punctuation
ValueCountFrequency (%)
/19049
99.8%
,20
 
0.1%
.18
 
0.1%
"2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
480596
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6864756
93.2%
Common499817
 
6.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i868403
12.7%
o756334
11.0%
a745540
10.9%
n639954
 
9.3%
e477618
 
7.0%
s430977
 
6.3%
l356434
 
5.2%
c347566
 
5.1%
f301475
 
4.4%
t240994
 
3.5%
Other values (23)1699461
24.8%
Common
ValueCountFrequency (%)
480596
96.2%
/19049
 
3.8%
048
 
< 0.1%
424
 
< 0.1%
824
 
< 0.1%
924
 
< 0.1%
,20
 
< 0.1%
.18
 
< 0.1%
26
 
< 0.1%
36
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII7364573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i868403
11.8%
o756334
 
10.3%
a745540
 
10.1%
n639954
 
8.7%
480596
 
6.5%
e477618
 
6.5%
s430977
 
5.9%
l356434
 
4.8%
c347566
 
4.7%
f301475
 
4.1%
Other values (34)1959676
26.6%

reason_for_stop_detail
Categorical

HIGH CORRELATION
MISSING

Distinct15
Distinct (%)< 0.1%
Missing20559
Missing (%)5.8%
Memory size2.7 MiB
Moving Violation
93602 
Officer witnessed commission of a crime
74999 
Matched suspect description
60965 
Equipment Violation
41879 
Other Reasonable Suspicion of a crime
37660 
Other values (10)
26542 

Length

Max length73
Median length27
Mean length28.78694283
Min length16

Characters and Unicode

Total characters9662251
Distinct characters47
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowOfficer witnessed commission of a crime
2nd rowMoving Violation
3rd rowMatched suspect description
4th rowOther Reasonable Suspicion of a crime
5th rowMoving Violation

Common Values

ValueCountFrequency (%)
Moving Violation93602
26.3%
Officer witnessed commission of a crime74999
21.1%
Matched suspect description60965
17.1%
Equipment Violation41879
11.8%
Other Reasonable Suspicion of a crime37660
10.6%
Non-moving Violation, including Registration Violation13374
 
3.8%
Witness or Victim identification of Suspect at the scene9597
 
2.7%
Actions indicative of casing a victim or location1149
 
0.3%
Actions indicative of drug transaction1128
 
0.3%
Actions indicative of engaging in violent crime513
 
0.1%
Other values (5)781
 
0.2%
(Missing)20559
 
5.8%

Length

2021-09-13T06:58:44.817901image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
violation162229
12.4%
of125311
 
9.6%
a113808
 
8.7%
crime113172
 
8.7%
moving93602
 
7.2%
officer74999
 
5.7%
commission74999
 
5.7%
witnessed74999
 
5.7%
suspect70562
 
5.4%
description60965
 
4.7%
Other values (44)339737
26.0%

Most occurring characters

ValueCountFrequency (%)
i1224651
12.7%
968736
 
10.0%
o898184
 
9.3%
e751760
 
7.8%
n702237
 
7.3%
s616623
 
6.4%
t605565
 
6.3%
c547197
 
5.7%
a453533
 
4.7%
m329173
 
3.4%
Other values (37)2564592
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8059403
83.4%
Space Separator968736
 
10.0%
Uppercase Letter607314
 
6.3%
Other Punctuation13381
 
0.1%
Dash Punctuation13380
 
0.1%
Decimal Number35
 
< 0.1%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1224651
15.2%
o898184
11.1%
e751760
9.3%
n702237
8.7%
s616623
 
7.7%
t605565
 
7.5%
c547197
 
6.8%
a453533
 
5.6%
m329173
 
4.1%
r314221
 
3.9%
Other values (15)1616259
20.1%
Uppercase Letter
ValueCountFrequency (%)
V171826
28.3%
M154567
25.5%
O113169
18.6%
R51034
 
8.4%
S48038
 
7.9%
E41879
 
6.9%
N13374
 
2.2%
W9597
 
1.6%
A3055
 
0.5%
C510
 
0.1%
Decimal Number
ValueCountFrequency (%)
012
34.3%
48
22.9%
86
17.1%
96
17.1%
73
 
8.6%
Other Punctuation
ValueCountFrequency (%)
,13376
> 99.9%
.5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
968736
100.0%
Dash Punctuation
ValueCountFrequency (%)
-13380
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8666717
89.7%
Common995534
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1224651
14.1%
o898184
10.4%
e751760
 
8.7%
n702237
 
8.1%
s616623
 
7.1%
t605565
 
7.0%
c547197
 
6.3%
a453533
 
5.2%
m329173
 
3.8%
r314221
 
3.6%
Other values (26)2223573
25.7%
Common
ValueCountFrequency (%)
968736
97.3%
-13380
 
1.3%
,13376
 
1.3%
012
 
< 0.1%
48
 
< 0.1%
86
 
< 0.1%
96
 
< 0.1%
.5
 
< 0.1%
73
 
< 0.1%
(1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII9662251
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i1224651
12.7%
968736
 
10.0%
o898184
 
9.3%
e751760
 
7.8%
n702237
 
7.3%
s616623
 
6.4%
t605565
 
6.3%
c547197
 
5.7%
a453533
 
4.7%
m329173
 
3.4%
Other values (37)2564592
26.5%

reason_for_stop_explanation
Categorical

HIGH CARDINALITY

Distinct170749
Distinct (%)47.9%
Missing2
Missing (%)< 0.1%
Memory size2.7 MiB
Speeding
 
4225
cell phone
 
3756
radio call
 
3347
encroachment
 
3007
stop sign
 
2694
Other values (170744)
339175 

Length

Max length100
Median length21
Mean length28.53074362
Min length3

Characters and Unicode

Total characters10162765
Distinct characters92
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique146502 ?
Unique (%)41.1%

Sample

1st rowstaggering, unable to safely walk
2nd rowSpeeding
3rd rowBoth parties involved in argument.
4th rowBoth parties engaged in argument.
5th rowUNSAFE DRIVING

Common Values

ValueCountFrequency (%)
Speeding4225
 
1.2%
cell phone3756
 
1.1%
radio call3347
 
0.9%
encroachment3007
 
0.8%
stop sign2694
 
0.8%
SPEED2192
 
0.6%
speeding2104
 
0.6%
STOP SIGN2062
 
0.6%
CELL PHONE1810
 
0.5%
ped stop1768
 
0.5%
Other values (170739)329239
92.4%

Length

2021-09-13T06:58:45.189906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
in48869
 
2.8%
of44153
 
2.6%
subject43865
 
2.6%
on40241
 
2.3%
a35235
 
2.1%
was31443
 
1.8%
to30175
 
1.8%
stop28949
 
1.7%
call27392
 
1.6%
radio24561
 
1.4%
Other values (26554)1363487
79.3%

Most occurring characters

ValueCountFrequency (%)
1389317
 
13.7%
e644662
 
6.3%
i494760
 
4.9%
t463450
 
4.6%
n461364
 
4.5%
a460065
 
4.5%
o432574
 
4.3%
s365550
 
3.6%
r350200
 
3.4%
l317730
 
3.1%
Other values (82)4783093
47.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5902529
58.1%
Uppercase Letter2622930
25.8%
Space Separator1389317
 
13.7%
Decimal Number154342
 
1.5%
Other Punctuation81885
 
0.8%
Dash Punctuation5739
 
0.1%
Open Punctuation2923
 
< 0.1%
Close Punctuation2848
 
< 0.1%
Math Symbol180
 
< 0.1%
Currency Symbol58
 
< 0.1%
Other values (3)14
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e644662
 
10.9%
i494760
 
8.4%
t463450
 
7.9%
n461364
 
7.8%
a460065
 
7.8%
o432574
 
7.3%
s365550
 
6.2%
r350200
 
5.9%
l317730
 
5.4%
d257388
 
4.4%
Other values (16)1654786
28.0%
Uppercase Letter
ValueCountFrequency (%)
E273695
 
10.4%
I215910
 
8.2%
T196201
 
7.5%
N196005
 
7.5%
S191447
 
7.3%
O188860
 
7.2%
A187968
 
7.2%
R158051
 
6.0%
L142726
 
5.4%
D122400
 
4.7%
Other values (16)749667
28.6%
Other Punctuation
ValueCountFrequency (%)
.50886
62.1%
,15856
 
19.4%
/10890
 
13.3%
'2007
 
2.5%
&635
 
0.8%
"553
 
0.7%
;249
 
0.3%
:229
 
0.3%
*209
 
0.3%
#186
 
0.2%
Other values (4)185
 
0.2%
Decimal Number
ValueCountFrequency (%)
535598
23.1%
134125
22.1%
026252
17.0%
417437
11.3%
213134
 
8.5%
68780
 
5.7%
35051
 
3.3%
75039
 
3.3%
84729
 
3.1%
94197
 
2.7%
Math Symbol
ValueCountFrequency (%)
+147
81.7%
>17
 
9.4%
<8
 
4.4%
=7
 
3.9%
~1
 
0.6%
Open Punctuation
ValueCountFrequency (%)
(2897
99.1%
[26
 
0.9%
Close Punctuation
ValueCountFrequency (%)
)2845
99.9%
]3
 
0.1%
Modifier Symbol
ValueCountFrequency (%)
^5
62.5%
`3
37.5%
Space Separator
ValueCountFrequency (%)
1389317
100.0%
Dash Punctuation
ValueCountFrequency (%)
-5739
100.0%
Currency Symbol
ValueCountFrequency (%)
$58
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8525459
83.9%
Common1637306
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e644662
 
7.6%
i494760
 
5.8%
t463450
 
5.4%
n461364
 
5.4%
a460065
 
5.4%
o432574
 
5.1%
s365550
 
4.3%
r350200
 
4.1%
l317730
 
3.7%
E273695
 
3.2%
Other values (42)4261409
50.0%
Common
ValueCountFrequency (%)
1389317
84.9%
.50886
 
3.1%
535598
 
2.2%
134125
 
2.1%
026252
 
1.6%
417437
 
1.1%
,15856
 
1.0%
213134
 
0.8%
/10890
 
0.7%
68780
 
0.5%
Other values (30)35031
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII10162765
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1389317
 
13.7%
e644662
 
6.3%
i494760
 
4.9%
t463450
 
4.6%
n461364
 
4.5%
a460065
 
4.5%
o432574
 
4.3%
s365550
 
3.6%
r350200
 
3.4%
l317730
 
3.1%
Other values (82)4783093
47.1%

action
Categorical

HIGH CORRELATION

Distinct25
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size2.7 MiB
None
222514 
Handcuffed or flex cuffed
38323 
Curbside detention
34942 
Search of person was conducted
 
17858
Patrol car detention
 
14709
Other values (20)
27859 

Length

Max length52
Median length4
Mean length11.63413765
Min length4

Characters and Unicode

Total characters4144138
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd rowCurbside detention
4th rowCurbside detention
5th rowNone

Common Values

ValueCountFrequency (%)
None222514
62.5%
Handcuffed or flex cuffed38323
 
10.8%
Curbside detention34942
 
9.8%
Search of person was conducted17858
 
5.0%
Patrol car detention14709
 
4.1%
Search of property was conducted6978
 
2.0%
Person removed from vehicle by order5217
 
1.5%
Asked for consent to search person3309
 
0.9%
Person photographed2828
 
0.8%
Asked for consent to search property2376
 
0.7%
Other values (15)7151
 
2.0%

Length

2021-09-13T06:58:45.511047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none222514
31.3%
detention49651
 
7.0%
or40035
 
5.6%
flex38323
 
5.4%
handcuffed38323
 
5.4%
cuffed38323
 
5.4%
curbside34942
 
4.9%
search30593
 
4.3%
person29672
 
4.2%
conducted26686
 
3.8%
Other values (47)161015
22.7%

Most occurring characters

ValueCountFrequency (%)
e628279
15.2%
o463527
11.2%
n431783
10.4%
353872
 
8.5%
d279068
 
6.7%
f227609
 
5.5%
N222514
 
5.4%
r219094
 
5.3%
c195054
 
4.7%
t175927
 
4.2%
Other values (27)947411
22.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3432482
82.8%
Uppercase Letter357784
 
8.6%
Space Separator353872
 
8.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e628279
18.3%
o463527
13.5%
n431783
12.6%
d279068
8.1%
f227609
 
6.6%
r219094
 
6.4%
c195054
 
5.7%
t175927
 
5.1%
u139974
 
4.1%
a132018
 
3.8%
Other values (15)540149
15.7%
Uppercase Letter
ValueCountFrequency (%)
N222514
62.2%
H38323
 
10.7%
C35135
 
9.8%
P26048
 
7.3%
S24836
 
6.9%
A5715
 
1.6%
V2981
 
0.8%
F2144
 
0.6%
E72
 
< 0.1%
B11
 
< 0.1%
Space Separator
ValueCountFrequency (%)
353872
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3790266
91.5%
Common353872
 
8.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e628279
16.6%
o463527
12.2%
n431783
11.4%
d279068
 
7.4%
f227609
 
6.0%
N222514
 
5.9%
r219094
 
5.8%
c195054
 
5.1%
t175927
 
4.6%
u139974
 
3.7%
Other values (26)807437
21.3%
Common
ValueCountFrequency (%)
353872
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4144138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e628279
15.2%
o463527
11.2%
n431783
10.4%
353872
 
8.5%
d279068
 
6.7%
f227609
 
5.5%
N222514
 
5.4%
r219094
 
5.3%
c195054
 
4.7%
t175927
 
4.2%
Other values (27)947411
22.9%

consented
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing350521
Missing (%)98.4%
Memory size695.8 KiB
True
 
4914
False
 
771
(Missing)
350521 
ValueCountFrequency (%)
True4914
 
1.4%
False771
 
0.2%
(Missing)350521
98.4%
2021-09-13T06:58:45.599810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

basis_for_search
Categorical

HIGH CORRELATION
MISSING

Distinct13
Distinct (%)< 0.1%
Missing282703
Missing (%)79.4%
Memory size2.7 MiB
Incident to arrest
36676 
Condition of parole / probation/ PRCS / mandatory supervision
21787 
Consent given
5532 
Officer Safety/safety of others
4392 
Vehicle inventory
 
1881
Other values (8)
 
3235

Length

Max length61
Median length18
Mean length31.11370964
Min length13

Characters and Unicode

Total characters2286951
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIncident to arrest
2nd rowIncident to arrest
3rd rowIncident to arrest
4th rowCondition of parole / probation/ PRCS / mandatory supervision
5th rowIncident to arrest

Common Values

ValueCountFrequency (%)
Incident to arrest36676
 
10.3%
Condition of parole / probation/ PRCS / mandatory supervision21787
 
6.1%
Consent given5532
 
1.6%
Officer Safety/safety of others4392
 
1.2%
Vehicle inventory1881
 
0.5%
Visible contraband848
 
0.2%
Evidence of crime775
 
0.2%
Suspected weapons674
 
0.2%
Odor of contraband453
 
0.1%
Search Warrant310
 
0.1%
Other values (3)175
 
< 0.1%
(Missing)282703
79.4%

Length

2021-09-13T06:58:45.834182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
43574
12.6%
to36676
10.6%
incident36676
10.6%
arrest36676
10.6%
of27414
7.9%
prcs21787
 
6.3%
supervision21787
 
6.3%
condition21787
 
6.3%
mandatory21787
 
6.3%
parole21787
 
6.3%
Other values (24)56273
16.3%

Most occurring characters

ValueCountFrequency (%)
272721
11.9%
o230875
10.1%
n207517
 
9.1%
t198613
 
8.7%
r174944
 
7.6%
i162900
 
7.1%
e157544
 
6.9%
a136989
 
6.0%
s97096
 
4.2%
d83468
 
3.6%
Other values (23)564284
24.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1778964
77.8%
Space Separator272721
 
11.9%
Uppercase Letter165353
 
7.2%
Other Punctuation69913
 
3.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o230875
13.0%
n207517
11.7%
t198613
11.2%
r174944
9.8%
i162900
9.2%
e157544
8.9%
a136989
7.7%
s97096
 
5.5%
d83468
 
4.7%
p66723
 
3.8%
Other values (12)262295
14.7%
Uppercase Letter
ValueCountFrequency (%)
C49114
29.7%
I36676
22.2%
S27170
16.4%
P21787
13.2%
R21787
13.2%
O4845
 
2.9%
V2729
 
1.7%
E935
 
0.6%
W310
 
0.2%
Space Separator
ValueCountFrequency (%)
272721
100.0%
Other Punctuation
ValueCountFrequency (%)
/69913
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1944317
85.0%
Common342634
 
15.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o230875
11.9%
n207517
10.7%
t198613
10.2%
r174944
9.0%
i162900
 
8.4%
e157544
 
8.1%
a136989
 
7.0%
s97096
 
5.0%
d83468
 
4.3%
p66723
 
3.4%
Other values (21)427648
22.0%
Common
ValueCountFrequency (%)
272721
79.6%
/69913
 
20.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2286951
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
272721
11.9%
o230875
10.1%
n207517
 
9.1%
t198613
 
8.7%
r174944
 
7.6%
i162900
 
7.1%
e157544
 
6.9%
a136989
 
6.0%
s97096
 
4.2%
d83468
 
3.6%
Other values (23)564284
24.7%

basis_for_search_explanation
Categorical

HIGH CARDINALITY
MISSING

Distinct26425
Distinct (%)49.3%
Missing302561
Missing (%)84.9%
Memory size2.7 MiB
incident to arrest
 
1599
arrest
 
1426
search incident to arrest
 
1097
arrested
 
598
Incident to arrest
 
511
Other values (26420)
48414 

Length

Max length100
Median length22
Mean length27.82494175
Min length1

Characters and Unicode

Total characters1492669
Distinct characters88
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22965 ?
Unique (%)42.8%

Sample

1st rowsubject was transported to detox and was searched accordingly.
2nd rowINCIDENT TO ARREST
3rd rowDRUNK IN PUBLIC
4th row4th waiver
5th rowFIGHTING

Common Values

ValueCountFrequency (%)
incident to arrest1599
 
0.4%
arrest1426
 
0.4%
search incident to arrest1097
 
0.3%
arrested598
 
0.2%
Incident to arrest511
 
0.1%
INCIDENT TO ARREST483
 
0.1%
searched incident to arrest481
 
0.1%
5150 hold391
 
0.1%
warrant384
 
0.1%
SEARCH INCIDENT TO ARREST375
 
0.1%
Other values (26415)46300
 
13.0%
(Missing)302561
84.9%

Length

2021-09-13T06:58:46.461659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
arrest17262
 
7.1%
to15922
 
6.5%
for12136
 
5.0%
incident9689
 
4.0%
arrested8073
 
3.3%
subject7567
 
3.1%
search7547
 
3.1%
was7070
 
2.9%
searched5396
 
2.2%
and5193
 
2.1%
Other values (6895)148694
60.8%

Most occurring characters

ValueCountFrequency (%)
195205
 
13.1%
e116049
 
7.8%
r100287
 
6.7%
t87551
 
5.9%
a85512
 
5.7%
n70046
 
4.7%
s66555
 
4.5%
o62638
 
4.2%
i53202
 
3.6%
d50141
 
3.4%
Other values (78)605483
40.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter924166
61.9%
Uppercase Letter309379
 
20.7%
Space Separator195205
 
13.1%
Decimal Number46230
 
3.1%
Other Punctuation12075
 
0.8%
Open Punctuation2503
 
0.2%
Close Punctuation2503
 
0.2%
Dash Punctuation557
 
< 0.1%
Math Symbol38
 
< 0.1%
Currency Symbol10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e116049
12.6%
r100287
10.9%
t87551
9.5%
a85512
9.3%
n70046
 
7.6%
s66555
 
7.2%
o62638
 
6.8%
i53202
 
5.8%
d50141
 
5.4%
c49111
 
5.3%
Other values (16)183074
19.8%
Uppercase Letter
ValueCountFrequency (%)
E33782
10.9%
R29269
9.5%
A28661
9.3%
T26582
 
8.6%
S26387
 
8.5%
N22069
 
7.1%
O20050
 
6.5%
I19547
 
6.3%
C17521
 
5.7%
D16630
 
5.4%
Other values (16)68881
22.3%
Other Punctuation
ValueCountFrequency (%)
.8116
67.2%
,2215
 
18.3%
/888
 
7.4%
&423
 
3.5%
'251
 
2.1%
:57
 
0.5%
;48
 
0.4%
"35
 
0.3%
*13
 
0.1%
#12
 
0.1%
Other values (3)17
 
0.1%
Decimal Number
ValueCountFrequency (%)
511278
24.4%
110048
21.7%
06660
14.4%
44943
10.7%
23745
 
8.1%
62886
 
6.2%
72476
 
5.4%
31816
 
3.9%
91418
 
3.1%
8960
 
2.1%
Math Symbol
ValueCountFrequency (%)
>29
76.3%
=5
 
13.2%
+3
 
7.9%
<1
 
2.6%
Open Punctuation
ValueCountFrequency (%)
(2502
> 99.9%
[1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
)2501
99.9%
]2
 
0.1%
Modifier Symbol
ValueCountFrequency (%)
^2
66.7%
`1
33.3%
Space Separator
ValueCountFrequency (%)
195205
100.0%
Dash Punctuation
ValueCountFrequency (%)
-557
100.0%
Currency Symbol
ValueCountFrequency (%)
$10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1233545
82.6%
Common259124
 
17.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e116049
 
9.4%
r100287
 
8.1%
t87551
 
7.1%
a85512
 
6.9%
n70046
 
5.7%
s66555
 
5.4%
o62638
 
5.1%
i53202
 
4.3%
d50141
 
4.1%
c49111
 
4.0%
Other values (42)492453
39.9%
Common
ValueCountFrequency (%)
195205
75.3%
511278
 
4.4%
110048
 
3.9%
.8116
 
3.1%
06660
 
2.6%
44943
 
1.9%
23745
 
1.4%
62886
 
1.1%
(2502
 
1.0%
)2501
 
1.0%
Other values (26)11240
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1492669
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
195205
 
13.1%
e116049
 
7.8%
r100287
 
6.7%
t87551
 
5.9%
a85512
 
5.7%
n70046
 
4.7%
s66555
 
4.5%
o62638
 
4.2%
i53202
 
3.6%
d50141
 
3.4%
Other values (78)605483
40.6%

basis_for_property_seizure
Categorical

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)0.1%
Missing348426
Missing (%)97.8%
Memory size2.7 MiB
Evidence
3272 
Contraband
2865 
Impound of vehicle
1054 
Safekeeping as allowed by law/statute
510 
Abandoned property
 
75
Other values (2)
 
4

Length

Max length37
Median length10
Mean length12.09974293
Min length8

Characters and Unicode

Total characters94136
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowImpound of vehicle
2nd rowEvidence
3rd rowImpound of vehicle
4th rowEvidence
5th rowContraband

Common Values

ValueCountFrequency (%)
Evidence3272
 
0.9%
Contraband2865
 
0.8%
Impound of vehicle1054
 
0.3%
Safekeeping as allowed by law/statute510
 
0.1%
Abandoned property75
 
< 0.1%
Suspected violation of school policy2
 
< 0.1%
Citation for infraction2
 
< 0.1%
(Missing)348426
97.8%

Length

2021-09-13T06:58:46.728581image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-13T06:58:46.822309image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
evidence3272
27.2%
contraband2865
23.8%
of1056
 
8.8%
impound1054
 
8.8%
vehicle1054
 
8.8%
law/statute510
 
4.2%
allowed510
 
4.2%
as510
 
4.2%
safekeeping510
 
4.2%
by510
 
4.2%
Other values (9)164
 
1.4%

Most occurring characters

ValueCountFrequency (%)
e11356
12.1%
n10724
 
11.4%
a8361
 
8.9%
d7853
 
8.3%
o5651
 
6.0%
i4850
 
5.2%
t4480
 
4.8%
c4334
 
4.6%
v4328
 
4.6%
4235
 
4.5%
Other values (19)27964
29.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter81611
86.7%
Uppercase Letter7780
 
8.3%
Space Separator4235
 
4.5%
Other Punctuation510
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e11356
13.9%
n10724
13.1%
a8361
10.2%
d7853
9.6%
o5651
 
6.9%
i4850
 
5.9%
t4480
 
5.5%
c4334
 
5.3%
v4328
 
5.3%
b3450
 
4.2%
Other values (12)16224
19.9%
Uppercase Letter
ValueCountFrequency (%)
E3272
42.1%
C2867
36.9%
I1054
 
13.5%
S512
 
6.6%
A75
 
1.0%
Space Separator
ValueCountFrequency (%)
4235
100.0%
Other Punctuation
ValueCountFrequency (%)
/510
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin89391
95.0%
Common4745
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e11356
12.7%
n10724
12.0%
a8361
 
9.4%
d7853
 
8.8%
o5651
 
6.3%
i4850
 
5.4%
t4480
 
5.0%
c4334
 
4.8%
v4328
 
4.8%
b3450
 
3.9%
Other values (17)24004
26.9%
Common
ValueCountFrequency (%)
4235
89.3%
/510
 
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII94136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e11356
12.1%
n10724
 
11.4%
a8361
 
8.9%
d7853
 
8.3%
o5651
 
6.0%
i4850
 
5.2%
t4480
 
4.8%
c4334
 
4.6%
v4328
 
4.6%
4235
 
4.5%
Other values (19)27964
29.7%

type_of_property_seized
Categorical

HIGH CORRELATION
MISSING

Distinct13
Distinct (%)0.2%
Missing348426
Missing (%)97.8%
Memory size2.7 MiB
Drugs/narcotics
2699 
Drug Paraphernalia
1316 
Vehicle
1018 
Other Contraband or evidence
606 
Weapon(s) other than a firearm
595 
Other values (8)
1546 

Length

Max length37
Median length15
Mean length16.80848329
Min length5

Characters and Unicode

Total characters130770
Distinct characters40
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowVehicle
2nd rowWeapon(s) other than a firearm
3rd rowVehicle
4th rowDrug Paraphernalia
5th rowDrug Paraphernalia

Common Values

ValueCountFrequency (%)
Drugs/narcotics2699
 
0.8%
Drug Paraphernalia1316
 
0.4%
Vehicle1018
 
0.3%
Other Contraband or evidence606
 
0.2%
Weapon(s) other than a firearm595
 
0.2%
Firearm(s)412
 
0.1%
Alcohol395
 
0.1%
Cell phone(s) or electronic device(s)284
 
0.1%
Suspected Stolen property231
 
0.1%
Money149
 
< 0.1%
Other values (3)75
 
< 0.1%
(Missing)348426
97.8%

Length

2021-09-13T06:58:47.135178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drugs/narcotics2699
18.1%
paraphernalia1316
 
8.8%
drug1316
 
8.8%
other1201
 
8.1%
vehicle1018
 
6.8%
or890
 
6.0%
contraband606
 
4.1%
evidence606
 
4.1%
than595
 
4.0%
a595
 
4.0%
Other values (15)4050
27.2%

Most occurring characters

ValueCountFrequency (%)
r14803
 
11.3%
a11967
 
9.2%
e10750
 
8.2%
c8500
 
6.5%
n8117
 
6.2%
o7427
 
5.7%
i7360
 
5.6%
s7204
 
5.5%
7112
 
5.4%
t6151
 
4.7%
Other values (30)41379
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter107868
82.5%
Uppercase Letter9931
 
7.6%
Space Separator7112
 
5.4%
Other Punctuation2699
 
2.1%
Open Punctuation1575
 
1.2%
Close Punctuation1575
 
1.2%
Decimal Number10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r14803
13.7%
a11967
11.1%
e10750
10.0%
c8500
7.9%
n8117
 
7.5%
o7427
 
6.9%
i7360
 
6.8%
s7204
 
6.7%
t6151
 
5.7%
h4809
 
4.5%
Other values (10)20780
19.3%
Uppercase Letter
ValueCountFrequency (%)
D4015
40.4%
P1316
 
13.3%
V1018
 
10.3%
C890
 
9.0%
O606
 
6.1%
W595
 
6.0%
A468
 
4.7%
S462
 
4.7%
F412
 
4.1%
M149
 
1.5%
Decimal Number
ValueCountFrequency (%)
53
30.0%
42
20.0%
12
20.0%
01
 
10.0%
61
 
10.0%
71
 
10.0%
Open Punctuation
ValueCountFrequency (%)
(1575
100.0%
Close Punctuation
ValueCountFrequency (%)
)1575
100.0%
Space Separator
ValueCountFrequency (%)
7112
100.0%
Other Punctuation
ValueCountFrequency (%)
/2699
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin117799
90.1%
Common12971
 
9.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r14803
12.6%
a11967
 
10.2%
e10750
 
9.1%
c8500
 
7.2%
n8117
 
6.9%
o7427
 
6.3%
i7360
 
6.2%
s7204
 
6.1%
t6151
 
5.2%
h4809
 
4.1%
Other values (20)30711
26.1%
Common
ValueCountFrequency (%)
7112
54.8%
/2699
 
20.8%
(1575
 
12.1%
)1575
 
12.1%
53
 
< 0.1%
42
 
< 0.1%
12
 
< 0.1%
01
 
< 0.1%
61
 
< 0.1%
71
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII130770
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r14803
 
11.3%
a11967
 
9.2%
e10750
 
8.2%
c8500
 
6.5%
n8117
 
6.2%
o7427
 
5.7%
i7360
 
5.6%
s7204
 
5.5%
7112
 
5.4%
t6151
 
4.7%
Other values (30)41379
31.6%

result_key
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size2.7 MiB

result
Categorical

HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size2.7 MiB
Citation for infraction
82209 
Field interview card completed
76131 
Warning (verbal or written)
60128 
No Action
49436 
Custodial Arrest without warrant
33364 
Other values (8)
54936 

Length

Max length73
Median length27
Mean length25.34757611
Min length9

Characters and Unicode

Total characters9028908
Distinct characters40
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCustodial Arrest without warrant
2nd rowWarning (verbal or written)
3rd rowNo Action
4th rowNo Action
5th rowNo Action

Common Values

ValueCountFrequency (%)
Citation for infraction82209
23.1%
Field interview card completed76131
21.4%
Warning (verbal or written)60128
16.9%
No Action49436
13.9%
Custodial Arrest without warrant33364
9.4%
In-field cite and release20791
 
5.8%
Psychiatric hold13874
 
3.9%
Custodial Arrest pursuant to outstanding warrant13001
 
3.6%
Noncriminal transport or caretaking transport6314
 
1.8%
Contacted parent/legal guardian or other person responsible for the minor911
 
0.3%
Other values (3)45
 
< 0.1%

Length

2021-09-13T06:58:47.373212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
for83120
 
6.6%
citation82209
 
6.6%
infraction82209
 
6.6%
field76131
 
6.1%
card76131
 
6.1%
interview76131
 
6.1%
completed76131
 
6.1%
or67358
 
5.4%
verbal60128
 
4.8%
written60128
 
4.8%
Other values (38)514137
41.0%

Most occurring characters

ValueCountFrequency (%)
i910724
10.1%
897609
9.9%
t897362
9.9%
r841970
 
9.3%
n716633
 
7.9%
e664969
 
7.4%
o633184
 
7.0%
a617613
 
6.8%
c346047
 
3.8%
d345089
 
3.8%
Other values (30)2157708
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7537252
83.5%
Space Separator897609
 
9.9%
Uppercase Letter452065
 
5.0%
Open Punctuation60128
 
0.7%
Close Punctuation60128
 
0.7%
Dash Punctuation20791
 
0.2%
Other Punctuation935
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i910724
12.1%
t897362
11.9%
r841970
11.2%
n716633
9.5%
e664969
8.8%
o633184
8.4%
a617613
8.2%
c346047
 
4.6%
d345089
 
4.6%
l323341
 
4.3%
Other values (12)1240320
16.5%
Uppercase Letter
ValueCountFrequency (%)
C129497
28.6%
A95801
21.2%
F76131
16.8%
W60128
13.3%
N55750
12.3%
I20791
 
4.6%
P13874
 
3.1%
R33
 
< 0.1%
S24
 
< 0.1%
U12
 
< 0.1%
Other values (2)24
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/911
97.4%
.24
 
2.6%
Space Separator
ValueCountFrequency (%)
897609
100.0%
Open Punctuation
ValueCountFrequency (%)
(60128
100.0%
Close Punctuation
ValueCountFrequency (%)
)60128
100.0%
Dash Punctuation
ValueCountFrequency (%)
-20791
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7989317
88.5%
Common1039591
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i910724
11.4%
t897362
11.2%
r841970
10.5%
n716633
9.0%
e664969
 
8.3%
o633184
 
7.9%
a617613
 
7.7%
c346047
 
4.3%
d345089
 
4.3%
l323341
 
4.0%
Other values (24)1692385
21.2%
Common
ValueCountFrequency (%)
897609
86.3%
(60128
 
5.8%
)60128
 
5.8%
-20791
 
2.0%
/911
 
0.1%
.24
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII9028908
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i910724
10.1%
897609
9.9%
t897362
9.9%
r841970
 
9.3%
n716633
 
7.9%
e664969
 
7.4%
o633184
 
7.0%
a617613
 
6.8%
c346047
 
3.8%
d345089
 
3.8%
Other values (30)2157708
23.9%

code
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1261
Distinct (%)0.4%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean28709.69618
Minimum0
Maximum99999
Zeros159725
Zeros (%)44.8%
Negative0
Negative (%)0.0%
Memory size2.7 MiB
2021-09-13T06:58:47.527534image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median32111
Q354194
95-th percentile65002
Maximum99999
Range99999
Interquartile range (IQR)54194

Descriptive statistics

Standard deviation27675.67946
Coefficient of variation (CV)0.9639837111
Kurtosis-1.672744757
Mean28709.69618
Median Absolute Deviation (MAD)32111
Skewness0.1030210085
Sum1.022650862 × 1010
Variance765943233.8
MonotonicityNot monotonic
2021-09-13T06:58:47.680127image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0159725
44.8%
6500233543
 
9.4%
541678822
 
2.5%
546557569
 
2.1%
541067227
 
2.0%
541466830
 
1.9%
321115801
 
1.6%
650005360
 
1.5%
640054984
 
1.4%
320224667
 
1.3%
Other values (1251)111676
31.4%
ValueCountFrequency (%)
0159725
44.8%
33
 
< 0.1%
40212
 
< 0.1%
40222
 
< 0.1%
40312
 
< 0.1%
40321
 
< 0.1%
40332
 
< 0.1%
40341
 
< 0.1%
40373
 
< 0.1%
404321
 
< 0.1%
ValueCountFrequency (%)
999991121
0.3%
99990550
0.2%
891053
 
< 0.1%
890051
 
< 0.1%
6621124
 
< 0.1%
6621095
 
< 0.1%
662081360
0.4%
66207196
 
0.1%
66205433
 
0.1%
662046
 
< 0.1%

result_text
Categorical

HIGH CARDINALITY
MISSING

Distinct1260
Distinct (%)0.6%
Missing159727
Missing (%)44.8%
Memory size2.7 MiB
65002 ZZ - LOCAL ORDINANCE VIOL (I) 65002
33543 
22450(A) VC - FAIL STOP VEH:XWALK/ETC (I) 54167
 
8822
23123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 54655
 
7569
22350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106
 
7227
21461(A) VC - DRIVER FAIL OBEY SIGN/ETC (I) 54146
 
6830
Other values (1255)
132488 

Length

Max length56
Median length46
Mean length44.71599
Min length24

Characters and Unicode

Total characters8785753
Distinct characters49
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique326 ?
Unique (%)0.2%

Sample

1st row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
2nd row22349(B) VC - EXC 55MPH SPEED:2 LANE RD (I) 54395
3rd row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
4th row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
5th row245(A)(1) PC - ADW NOT FIREARM (F) 13219

Common Values

ValueCountFrequency (%)
65002 ZZ - LOCAL ORDINANCE VIOL (I) 6500233543
 
9.4%
22450(A) VC - FAIL STOP VEH:XWALK/ETC (I) 541678822
 
2.5%
23123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 546557569
 
2.1%
22350 VC - UNSAFE SPEED:PREVAIL COND (I) 541067227
 
2.0%
21461(A) VC - DRIVER FAIL OBEY SIGN/ETC (I) 541466830
 
1.9%
647(E) PC - DIS CON:LODGE W/O CONSENT (M) 321115801
 
1.6%
65000 ZZ - LOCAL ORDINANCE VIOL (M) 650005360
 
1.5%
647(F) PC - DISORD CONDUCT:ALCOHOL (M) 640054984
 
1.4%
602 PC - TRESPASSING (M) 320224667
 
1.3%
25620 BP - POSS OPEN ALCOHOL:PUBLIC (I) 410634449
 
1.2%
Other values (1250)107227
30.1%
(Missing)159727
44.8%

Length

2021-09-13T06:58:48.023209image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
198310
 
12.2%
i132740
 
8.1%
vc99844
 
6.1%
6500267086
 
4.1%
m50629
 
3.1%
viol43218
 
2.6%
zz40577
 
2.5%
pc40235
 
2.5%
local38903
 
2.4%
ordinance38903
 
2.4%
Other values (4435)881388
54.0%

Most occurring characters

ValueCountFrequency (%)
1435354
 
16.3%
I444256
 
5.1%
E387229
 
4.4%
C383546
 
4.4%
O352675
 
4.0%
A346513
 
3.9%
0344332
 
3.9%
5325492
 
3.7%
L322867
 
3.7%
(319667
 
3.6%
Other values (39)4123822
46.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4431677
50.4%
Decimal Number1906415
21.7%
Space Separator1435354
 
16.3%
Open Punctuation319667
 
3.6%
Close Punctuation319622
 
3.6%
Dash Punctuation198874
 
2.3%
Other Punctuation173505
 
2.0%
Currency Symbol506
 
< 0.1%
Math Symbol133
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I444256
 
10.0%
E387229
 
8.7%
C383546
 
8.7%
O352675
 
8.0%
A346513
 
7.8%
L322867
 
7.3%
N266929
 
6.0%
V225202
 
5.1%
S217133
 
4.9%
D210411
 
4.7%
Other values (16)1274916
28.8%
Decimal Number
ValueCountFrequency (%)
0344332
18.1%
5325492
17.1%
2303800
15.9%
4233800
12.3%
1223130
11.7%
6205744
10.8%
3137177
 
7.2%
757832
 
3.0%
940141
 
2.1%
834967
 
1.8%
Other Punctuation
ValueCountFrequency (%)
/88580
51.1%
:66678
38.4%
.16533
 
9.5%
&1493
 
0.9%
'205
 
0.1%
"14
 
< 0.1%
,2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(319667
100.0%
Close Punctuation
ValueCountFrequency (%)
)319622
100.0%
Space Separator
ValueCountFrequency (%)
1435354
100.0%
Dash Punctuation
ValueCountFrequency (%)
-198874
100.0%
Currency Symbol
ValueCountFrequency (%)
$506
100.0%
Math Symbol
ValueCountFrequency (%)
+133
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4431677
50.4%
Common4354076
49.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
I444256
 
10.0%
E387229
 
8.7%
C383546
 
8.7%
O352675
 
8.0%
A346513
 
7.8%
L322867
 
7.3%
N266929
 
6.0%
V225202
 
5.1%
S217133
 
4.9%
D210411
 
4.7%
Other values (16)1274916
28.8%
Common
ValueCountFrequency (%)
1435354
33.0%
0344332
 
7.9%
5325492
 
7.5%
(319667
 
7.3%
)319622
 
7.3%
2303800
 
7.0%
4233800
 
5.4%
1223130
 
5.1%
6205744
 
4.7%
-198874
 
4.6%
Other values (13)444261
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII8785753
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1435354
 
16.3%
I444256
 
5.1%
E387229
 
4.4%
C383546
 
4.4%
O352675
 
4.0%
A346513
 
3.9%
0344332
 
3.9%
5325492
 
3.7%
L322867
 
3.7%
(319667
 
3.6%
Other values (39)4123822
46.9%

Interactions

2021-09-13T06:57:57.547426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:57.871131image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:58.169367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:58.418668image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:58.685985image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:58.957227image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:59.204593image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:59.479862image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:57:59.760378image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:00.008713image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:00.263061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:00.527472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:00.766999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:01.006361image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:01.256692image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:01.491068image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:01.780289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:02.103398image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:02.416595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:02.694850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:02.966091image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:03.208506image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:03.451828image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:03.723093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:03.962453image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:04.288306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:04.632358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:04.977052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:05.341439image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:05.674575image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:05.988717image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:06.314870image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:06.580108image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:06.852381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:07.117697image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:07.384983image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:07.652269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:07.925511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:08.285630image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:08.669931image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:09.016011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:09.280306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:09.586488image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:09.878737image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:10.151134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:10.407476image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:10.697699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:11.081647image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:11.425838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:11.797841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:12.150898image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:12.446108image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:12.799163image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:13.109357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:13.391581image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:13.675850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:13.981004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:14.274989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:14.561805image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:14.861031image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:15.089421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:15.414523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:15.698774image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:15.979012image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:16.252281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:16.511587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:16.765935image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:17.010281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:17.261582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:17.500942image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:17.793187image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:18.102335image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:18.385604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:18.857325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:19.120637image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:19.359000image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:19.613295image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:19.898561image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:20.140883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:20.411160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-13T06:58:20.667501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-13T06:58:48.186714image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-13T06:58:48.497018image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-13T06:58:48.793226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-13T06:58:49.121349image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-09-13T06:58:21.397946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-13T06:58:25.440935image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-13T06:58:29.429336image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-09-13T06:58:30.994997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

stop_iddate_stoptime_stopstop_durationstop_in_response_to_cfsaddress_citybeatbeat_namehighway_exitaddress_streetintersectionaddress_blocklandmarkis_schoolschool_nameoriagencyofficer_assignment_keyassignmentexp_yearspidis_studentperceived_limited_englishperceived_agegender2perceived_gendergender_ncgender_non_conforminggenderperceived_lgbtracedisabilityreason_for_stop_codereason_for_stop_code_textreason_for_stopreason_for_stop_detailreason_for_stop_explanationactionconsentedbasis_for_searchbasis_for_search_explanationbasis_for_property_seizuretype_of_property_seizedresult_keyresultcoderesult_text
024432018-07-0100:01:37300SAN DIEGO122Pacific Beach 122NaNGrand AvenueNaN700NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations10100251Male00MaleNoWhiteNone64005647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Reasonable SuspicionOfficer witnessed commission of a crimestaggering, unable to safely walkNoneNaNNaNNaNNaNNaN6Custodial Arrest without warrant64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
124442018-07-0100:03:34100SAN DIEGO121Mission Beach 121NaNNOBEL DRIVEI-50NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations18100251Male00MaleNoWhiteNone5410622350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106Traffic ViolationMoving ViolationSpeedingNoneNaNNaNNaNNaNNaN2Warning (verbal or written)54395.022349(B) VC - EXC 55MPH SPEED:2 LANE RD (I) 54395
224472018-07-0100:05:43151SAN DIEGO822El Cerrito 822NaN59th StreetNaN4400NaN0NaNCA0371100SD10Other1100301Male00MaleNoHispanic/Latino/aNone53072415(1) PC - FIGHT IN PUBLIC PLACE (M) 53072Reasonable SuspicionMatched suspect descriptionBoth parties involved in argument.Curbside detentionNaNNaNNaNNaNNaN1No Action0.0NaN
324472018-07-0100:05:43151SAN DIEGO822El Cerrito 822NaN59th StreetNaN4400NaN0NaNCA0371100SD10Other1200302Female00FemaleNoHispanic/Latino/aNone53072415(1) PC - FIGHT IN PUBLIC PLACE (M) 53072Reasonable SuspicionOther Reasonable Suspicion of a crimeBoth parties engaged in argument.Curbside detentionNaNNaNNaNNaNNaN1No Action0.0NaN
424482018-07-0100:19:0650SAN DIEGO614Ocean Beach 614NaNNIAGARA AVENaN4800NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations3100231Male00MaleNoWhiteNone5410622350 VC - UNSAFE SPEED:PREVAIL COND (I) 54106Traffic ViolationMoving ViolationUNSAFE DRIVINGNoneNaNNaNNaNNaNNaN1No Action0.0NaN
524492018-07-0100:03:00151SAN DIEGO115University City 115NaNla jolla village drNaN4500NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1100251Male00MaleNoWhiteNone13045242 PC - BATTERY (M) 13045Reasonable SuspicionMatched suspect descriptionmatched description of suspect on 242 radio call.Curbside detentionNaNIncident to arrestsubject was transported to detox and was searched accordingly.NaNNaN6Custodial Arrest without warrant64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
624512018-07-0100:24:02200SAN DIEGO122Pacific Beach 122NaNThomasNaN800NaN0NaNCA0371100SD10Other24100221Male00MaleNoWhiteNone64005647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Reasonable SuspicionOther Reasonable Suspicion of a crimedrunkHandcuffed or flex cuffedNaNNaNNaNNaNNaN6Custodial Arrest without warrant64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
724532018-07-0100:31:191200SAN DIEGO446Lincoln Park 446NaNLOGAN AVENUENaN4800NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1100371Male00MaleNoBlack/African AmericanNone13219245(A)(1) PC - ADW NOT FIREARM (F) 13219Reasonable SuspicionMatched suspect descriptionARREST CONNECTED WITH 245A1, 594B1Property was seizedNaNIncident to arrestINCIDENT TO ARRESTImpound of vehicleVehicle6Custodial Arrest without warrant13219.0245(A)(1) PC - ADW NOT FIREARM (F) 13219
824542018-07-0100:33:1930SAN DIEGO826Colina Del Sol 826NaNestrellaNaN4100NaN0NaNCA0371100SD2Gang enforcement4100301Male00MaleNoBlack/African AmericanNone540994000(A) VC - NO REG:VEH/TRAILER/ETC (I) 54099Traffic ViolationNon-moving Violation, including Registration ViolationPursuit of justiceNoneNaNNaNNaNNaNNaN1No Action0.0NaN
924552018-07-0100:11:00201SAN DIEGO122Pacific Beach 122NaNhornblend stNaN1100NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations3100281Male00MaleNoBlack/African AmericanNone53072415(1) PC - FIGHT IN PUBLIC PLACE (M) 53072Reasonable SuspicionMatched suspect descriptionmatched desc suspect seen fightingHandcuffed or flex cuffedNaNNaNNaNNaNNaN6Custodial Arrest without warrant64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005

Last rows

stop_iddate_stoptime_stopstop_durationstop_in_response_to_cfsaddress_citybeatbeat_namehighway_exitaddress_streetintersectionaddress_blocklandmarkis_schoolschool_nameoriagencyofficer_assignment_keyassignmentexp_yearspidis_studentperceived_limited_englishperceived_agegender2perceived_gendergender_ncgender_non_conforminggenderperceived_lgbtracedisabilityreason_for_stop_codereason_for_stop_code_textreason_for_stopreason_for_stop_detailreason_for_stop_explanationactionconsentedbasis_for_searchbasis_for_search_explanationbasis_for_property_seizuretype_of_property_seizedresult_keyresultcoderesult_text
3561963246872020-06-3019:40:001200SAN DIEGO826Colina Del Sol 826NaNEL CAJON BOULEVARDNaN5200NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations5100501Male00MaleNoBlack/African AmericanNone12004211 PC - ROBBERY (F) 12004Reasonable SuspicionMatched suspect descriptionSUBJECT MATCHED THE DESCRIPTION OF THE SUSPECT GIVEN BY THE VICTIMSHandcuffed or flex cuffedNaNNaNNaNNaNNaN6Custodial Arrest without warrant12004.0211 PC - ROBBERY (F) 12004
3561973246902020-06-3023:25:00100SAN DIEGO999Unknown 999NB 805 AT PLAZA BLVDNaNNaN0NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations15100221Male00MaleNoHispanic/Latino/aNone5430322349(A) VC - EXCEED SPEED ON HIGHWAY (I) 54303Traffic ViolationMoving ViolationMAX SPEED 65 MPHNoneNaNNaNNaNNaNNaN3Citation for infraction54303.022349(A) VC - EXCEED SPEED ON HIGHWAY (I) 54303
3561983246912020-06-3023:50:00120SAN DIEGO999Unknown 999NB I-15 AT I-8NaNNaN0NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations15100251Male00MaleNoHispanic/Latino/aNone5430322349(A) VC - EXCEED SPEED ON HIGHWAY (I) 54303Traffic ViolationMoving ViolationMAX SPEED 65 MPHNoneNaNNaNNaNNaNNaN2Warning (verbal or written)54303.022349(A) VC - EXCEED SPEED ON HIGHWAY (I) 54303
3561993246972020-06-3011:41:321200SAN DIEGO611Midway District 611NaNlaning roadNaN2500NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1100401Male00MaleYesHispanic/Latino/aNone53074415(3) PC - OFFENSIVE WORDS:PUBLIC PL (M) 53074Reasonable SuspicionMatched suspect descriptionsubject matched description of a radio call regarding a disturbance at a hotelHandcuffed or flex cuffedNaNNaNNaNNaNNaN8Noncriminal transport or caretaking transport0.0NaN
3562003247012020-06-3022:38:42120SAN DIEGO712San Ysidro 712NaNe san ysidroNaN700NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1100341Male00MaleNoBlack/African AmericanNone22004459 PC - BURGLARY (F) 22004Reasonable SuspicionMatched suspect descriptionburglary callHandcuffed or flex cuffedNaNNaNNaNNaNNaN1No Action0.0NaN
3562013247122020-06-3023:34:00200SAN DIEGO514Sherman Heights 514NaN25thNaN100NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1100421Male00MaleNoHispanic/Latino/aNone0NaNInvestigation to determine whether the person was truantNaNcall for service regarding setting off fireworksCurbside detentionNaNNaNNaNNaNNaN1No Action0.0NaN
3562023247122020-06-3023:34:00200SAN DIEGO514Sherman Heights 514NaN25thNaN100NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1200221Male00MaleNoHispanic/Latino/aNone0NaNInvestigation to determine whether the person was truantNaNcall for service regarding setting off fireworksCurbside detentionNaNNaNNaNNaNNaN1No Action0.0NaN
3562033247122020-06-3023:34:00200SAN DIEGO514Sherman Heights 514NaN25thNaN100NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations1300301Male00MaleNoHispanic/Latino/aNone0NaNInvestigation to determine whether the person was truantNaNcall for service regarding setting off fireworksCurbside detentionNaNNaNNaNNaNNaN1No Action0.0NaN
3562043247152020-06-3015:25:00110SAN DIEGO723Otay Mesa West 723NaNNaN5 south / sr-540NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations11100301Male00MaleNoHispanic/Latino/aNone5418121755 VC - USE SHOLDER/ETC:PAS RIGHT (I) 54181Traffic ViolationMoving Violationdrive on right shoulderNoneNaNNaNNaNNaNNaN3Citation for infraction54181.021755 VC - USE SHOLDER/ETC:PAS RIGHT (I) 54181
3562053247162020-06-3015:45:00120LA MESA999Unknown 999NaNNaNsr-125 north / 94 mege0NaN0NaNCA0371100SD1Patrol, traffic enforcement, field operations11100261Male00MaleNoWhiteNone5465523123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 54655Traffic ViolationMoving Violationcell phoneNoneNaNNaNNaNNaNNaN3Citation for infraction54655.023123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 54655